About the Event
Learning a new object class from cluttered training images is very challenging when the location of object instances is unknown (weakly supervised setting). Because of this, previous works generally require objects covering a large portion of the images, such as in the Caltech4 or Weizmann Horses datasets. In the traditional paradigm, each new class is learned from scratch without any knowledge other than what was engineered into the system. In this talk instead, I will explore a scenario where knowledge generic over classes is first learned from images of various classes with given object locations, and then employed to support learning any new class without location annotation. Generic knowledge provides a strong basis which facilitates weakly supervised learning. I will present a novel Conditional Random Field which incorporates generic knowledge and simultaneously localizes object instances while learning an appearance model specific for the new class. As demonstrated experimentally, our approach enables learning from very challenging images containing extensive clutter and large scale and appearance variations between object instances, such as the PASCAL VOC 07. We directly evaluate performance as the percentage of object instances correctly localized in their training images, and compare to several existing methods and baselines. To the best of our knowledge, no earlier method has been demonstrated capable of learning from PASCAL VOC 07 in a weakly supervised setting. During the talk I will also present in depth the most important component of the proposed generic knowledge. This is a generic objectness measure, quantifying how likely it is for an image window to contain an object of any class. It is trained to distinguish objects with a well-defined boundary in space, such as cows and telephones, from amorphous background elements, such as grass and road. It combines several image cues measuring characteristics of objects, such as appearing different from their surroundings and having a closed boundary. In experiments on PASCAL VOC 07, objectness outperforms state-of-the-art saliency measures. Finally, we give an algorithm to employ objectness to greatly reduce the number of windows that class-specific object detectors need to evaluate.
Viittorio Ferrari is an Assistant Professor at the Swiss Federal Institute of Technology Zurich (ETHZ). After receiving his PhD from ETHZ in 2004, he was a post-doctoral researcher at INRIA Grenoble and the University of Oxford. His research interests are in visual learning, human pose estimation, and image-text correspondences. In 2008 he was awarded a Swiss National Science Foundation Professorship grant for outstanding young researchers. He will be an Area Chair for the International Conference on Computer Vision 2011.