About the Event
Estimating the 3D structure of a scene and recognizing scene elements are two kernel functions supporting many artificial intelligence applications. The ability to achieve these two goals only using RGB images as input is very valuable to a low cost artificial intelligence system but also extremely challenging. A scene may comprise a large
number of points, regions, and objects. Identifying their existence and distinguishing their semantic properties from input images are related to two research topics in computer vision: geometric scene understanding and semantic scene understanding. Over the past decades, many researchers were devoted into solving the problem of geometric
scene understanding such as the works in camera calibration, structure-from-motion, and dense reconstruction. Meanwhile, numerous other researchers studied the problem of semantic scene understanding including the works in object recognition, region segmentation, and layout estimation. However, these efforts of disjointly solving the
geometric or the semantic understanding problem usually lead to limited estimation capability and recognition accuracy.
In this thesis, I will propose a novel image-based framework to jointly solve the geometric and semantic scene understanding problems, which includes the complete process of recognizing elements in a scene, estimating their spatial properties, and identifying their
mutual relationships. Recognizing components in a scene provides constraints to estimate the geometric structure of the scene, while the estimated geometric structure in turn greatly helps the recognition task by providing contextual information and pruning out impossible configurations of scene components. Experiments proved that, by jointly solving the geometric understanding and semantic understanding problems, the two can be solved with an accuracy significantly higher than solving them separately.