Efficient Multi Level Scene Understanding In Videos


Download Efficient Multi Level Scene Understanding In Videos PDF/ePub or read online books in Mobi eBooks. Click Download or Read Online button to get Efficient Multi Level Scene Understanding In Videos book now. This website allows unlimited access to, at the time of writing, more than 1.5 million titles, including hundreds of thousands of titles in various foreign languages.

Download

Efficient Multi-level Scene Understanding in Videos


Efficient Multi-level Scene Understanding in Videos

Author: Buyu Liu

language: en

Publisher:

Release Date: 2016


DOWNLOAD





Automatic video parsing is a key step towards human-level dynamic scene understanding, and a fundamental problem in computer vision. A core issue in video understanding is to infer multiple scene properties of a video in an efficient and consistent manner. This thesis addresses the problem of holistic scene understanding from monocular videos, which jointly reason about semantic and geometric scene properties from multiple levels, including pixelwise annotation of video frames, object instance segmentation in spatio-temporal domain, and/or scene-level description in terms of scene categories and layouts. We focus on four main issues in the holistic video understanding: 1) what is the representation for consistent semantic and geometric parsing of videos? 2) how do we integrate high-level reasoning (e.g., objects) with pixel-wise video parsing? 3) how can we do efficient inference for multi-level video understanding? and 4) what is the representation learning strategy for efficient/cost-aware scene parsing? We discuss three multi-level video scene segmentation scenarios based on different aspects of scene properties and efficiency requirements. The first case addresses the problem of consistent geometric and semantic video segmentation for outdoor scenes. We propose a geometric scene layout representation, or a stage scene model, to efficiently capture the dependency between the semantic and geometric labels. We build a unified conditional random field for joint modeling of the semantic class, geometric label and the stage representation, and design an alternating inference algorithm to minimize the resulting energy function. The second case focuses on the problem of simultaneous pixel-level and object-level segmentation in videos. We propose to incorporate foreground object information into pixel labeling by jointly reasoning semantic labels of supervoxels, object instance tracks and geometric relations between objects. In order to model objects, we take an exemplar approach based on a small set of object annotations to generate a set of object proposals. We then design a conditional random field framework that jointly models the supervoxel labels and object instance segments. To scale up our method, we develop an active inference strategy to improve the efficiency of multi-level video parsing, which adaptively selects an informative subset of object proposals and performs inference on the resulting compact model. The last case explores the problem of learning a flexible representation for efficient scene labeling. We propose a dynamic hierarchical model that allows us to achieve flexible trade-offs between efficiency and accuracy. Our approach incorporates the cost of feature computation and model inference, and optimizes the model performance for any given test-time budget. We evaluate all our methods on several publicly available video and image semantic segmentation datasets, and demonstrate superior performance in efficiency and accuracy.

Computer Vision -- ECCV 2014


Computer Vision -- ECCV 2014

Author: David Fleet

language: en

Publisher: Springer

Release Date: 2014-08-14


DOWNLOAD





The seven-volume set comprising LNCS volumes 8689-8695 constitutes the refereed proceedings of the 13th European Conference on Computer Vision, ECCV 2014, held in Zurich, Switzerland, in September 2014. The 363 revised papers presented were carefully reviewed and selected from 1444 submissions. The papers are organized in topical sections on tracking and activity recognition; recognition; learning and inference; structure from motion and feature matching; computational photography and low-level vision; vision; segmentation and saliency; context and 3D scenes; motion and 3D scene analysis; and poster sessions.

Computer Vision – ECCV 2024


Computer Vision – ECCV 2024

Author: Aleš Leonardis

language: en

Publisher: Springer Nature

Release Date: 2024-11-16


DOWNLOAD





The multi-volume set of LNCS books with volume numbers 15059 up to 15147 constitutes the refereed proceedings of the 18th European Conference on Computer Vision, ECCV 2024, held in Milan, Italy, during September 29–October 4, 2024. The 2387 papers presented in these proceedings were carefully reviewed and selected from a total of 8585 submissions. The papers deal with topics such as computer vision; machine learning; deep neural networks; reinforcement learning; object recognition; image classification; image processing; object detection; semantic segmentation; human pose estimation; 3d reconstruction; stereo vision; computational photography; neural networks; image coding; image reconstruction; motion estimation.