Defense Event

Scale-Adaptive Video Understanding

Chenliang Xu

Friday, July 15, 2016
12:30pm - 2:30pm
3316 EECS Bldg.

Add to Google Calendar

About the Event

To reach the next level in capability, computer systems relying on visual perception need to understand not only what action is happening in a video, but also who is doing the action and where the action is happening. It is increasingly critical to extracting semantics from videos and, ultimately, to interacting with humans in our complex world. However, achieving this goal is nontrivial – context in video varies in both spatial scales and temporal scales. The ability to choose the right scale for efficient video understanding remains an open question. In this talk, I will introduce a comprehensive set of methods of adapting the scale during video understanding. I will start by introducing a streaming video segmentation framework that generates a hierarchy of multi-scale decompositions for videos with arbitrary length. Then I will talk about two methods regarding the scale selection problem in this hierarchical representation. The first method flattens the entire hierarchy into a single segmentation using quadratic integer programming that balances the relative level of information in the field. We show that it is possible to adaptively select the scales of video content based on various post hoc feature criteria, such as motion-ness and object-ness. The second method combines the segmentation hierarchy with a local CRF for the task of localizing and recognizing actors and actions in video. It defines a dynamic and continuous process of information exchange: the local CRF influences what scales are active in the hierarchy, and these active scales, in turn, influence the connectivity in the CRF. Experiments on a large-scale video dataset demonstrate the effectiveness of the explicit consideration of scale selection in video understanding.

Additional Information

Sponsor(s): CSE

Open to: Public