Semantic Structure From Motion (SSFM)
 --- estimating objects and cameras in a scene from images
Sid Yingze Bao and Silvio Savarese, Computer Vision Lab, University of Michigan at Ann Arbor
Sponsorship
    nsf   

What is it about?
Semantic Structure from Motion (SSFM) is a new framework for jointly recognizing objects and reconstructing the underlying 3D geometry of a scene (cameras, points and objects). SSFM framework leverages on the intuition that measurements of keypoints and objects must be semantically and geometrically consistent across view points. SSFM has the unique ability to: i) enhance camera pose estimation, compared to feature-point-based SFM algorithms; ii) improve object detections given multiple uncalibrated images, compared to independently detecting objects in single images. iii) estimate camera poses from object detections only.

main-theme-figure
Check out our paper for details!


Update
Dec 20, 2011
  • Our paper won the best student paper award in IEEE Workshop on Challenges and Opportunities in Robot Perception!
Oct 20, 2011
  • Ford Car Dataset is updated. bugs in the 2D annotations are fixed.
Oct 3, 2011
  • Ford Car Dataset is updated. The 3D point clouds are included. The images are cropped to proper sizes. Object detector models are included.
Sep 29, 2011
  • Kinect Dataset is updated. The list for testing pairs are included. The object detector models for mouse, keyboard, bottle, cup, monitor are included.
Sep 9, 2011
  • Kinect Dataset is uploaded! You could find 3D point clouds aligned with 2D images.
Jun 19.2011
  • Detector model files of ford-car dataset are uploaded.
Jun. 7. 2011
  • The webpage is alive!
  • Source code version 0.1 is posted. This version might contain bugs. Please email me and let's make the software packege better!
  • Car dataset is uploaded.
  • I will reply all related emails promptly.






Who might be interested?
Researchers who are familiar with the keywords -- Structure From Motion, Object Detection, Scene Understanding -- might find SSFM useful and a state-of-the-art baseline to compare with.




Papers and
Citations

  • S. Yingze Bao and S. Savarese, Semantic Structure from Motion, Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 2011
    download pdf and bibtex
  • S. Yingze Bao, M. Bagra, S. Savarese, Semantic Structure From Motion with Object and Point Interactions, IEEE Workshop on Challenges and Opportunities in Robot Perception (in conjunction with ICCV-11)
    download pdf and bibtex. Winner of the best student paper award



  • About authors
    Sid Yingze Bao is a PhD student building an automatic system that can interpret a scene's 3D structure and understand the scene component's semantical label.



    Results Below are three youtube videos highlighting SSFM. We show input images, the limitations of single image object recognition approach, and the final result of SSFM (improved 2D object recognition and reconstructed 3D scene). For technical details, please refer to our paper.



    Video credits: Mohit Bagra
    Soure Code
    Dataset
    • Kinect dataset (testing set) (version 0.2, date: Sep 29). See README.txt in. Email the author for obtaining training set. Object detector models are included.
    • Ford Car Dataset (version 0.3, date: Oct 20). This dataset is a joint effort of Pandey et al. (for collecting images, Lidar points, calibration etc.) and us (for annotation of 2D and 3D objects). So please cite both papers if you appreciate the authors' effort.

    Last update: 1/27/12