Visual Object Tracking In Dynamic Scenes

Download Visual Object Tracking In Dynamic Scenes PDF/ePub or read online books in Mobi eBooks. Click Download or Read Online button to get Visual Object Tracking In Dynamic Scenes book now. This website allows unlimited access to, at the time of writing, more than 1.5 million titles, including hundreds of thousands of titles in various foreign languages.
Visual Object Tracking in Dynamic Scenes

Visual object tracking is a fundamental task in the field computer vision. Visual object tracking is widely used in numerous applications which include, but are not limited to video surveillance, image understanding, robotics, and human-computer interaction. In essence, visual object tracking is the problem of estimating the states/trajectory of the object of interest over time. Unlike other tasks such as object detection where the number of classes/categories are defined beforehand, the only available information of the object of interest is at the first frame. Even though, Deep Learning (DL) has revolutionised most computer vision tasks, visual object tracking still imposes several challenges. The nature of visual object tracking task is stochastic, where no prior-knowledge is available about the object of interest during the training or testing/inference. Moreover, visual object tracking is a class-agnostic task, as opposed object detection and segmentation tasks. In this thesis, the main objective is to develop and advance the visual object trackers using novel designs of deep learning frameworks and mathematical formulations. To take advantage of different trackers, a novel framework is developed to track moving objects based on a composite framework and a reporter mechanism. The composite framework has built-in trackers and user-defined trackers to track the object of interest. The framework contains a module to calculate the robustness for each tracker and a reporter mechanism serves as a recovery mechanism if trackers fail to locate the object of interest. Different trackers may fail to track the object of interest, thus, a more robust framework based on Siamese network architecture, namely DensSiam, is proposed to use the concept of dense layers and connects each dense layer in the network to all layers in a feed-forward fashion with a similarity-learning function. DensSiam also includes a Self-Attention mechanism to force the network to pay more attention to non-local features during offline training. Generally, Siamese trackers do not fully utilize semantic and objectness information from pre-trained networks that have been trained on an image classification task. To solve this problem a novel architecture design is proposed , dubbed DomainSiam, to learn a Domain-Aware that fully utilizes semantic and objectness information while producing a class-agnostic track using a ridge regression network. Moreover, to reduce the sparsity problem, we solve the ridge regression problem with a differentiable weighted-dynamic loss function. Siamese trackers have high speed and work in real-time, however, they lack high accuracy. To overcome this challenge, a novel dynamic policy gradient Agent-Environment architecture with Siamese network (DP-Siam) is proposed to train the tracker to increase the accuracy and the expected average overlap while running in real-time. DP-Siam is trained offline with reinforcement learning to produce a continuous action that predicts the optimal object location. One of the common design block in most object trackers in the literature is the backbone network, where the backbone network is trained in the feature space. To design a backbone network that maps from feature space to another space (i.e., joint-nullspace) and more suitable for object tracking and classification, a novel framework is proposed. The new framework is called NullSpaceNet has a clear interpretation for the feature representation and the features in this space are more separable. NullSpaceNet is utilized in object tracking by regularizing the discriminative joint-nullspace backbone network. The novel tracker is called NullSpaceRDAR, and encourages the network to have a representation for the target-specific information for the object of interest in the joint-nullspace. In contrast to feature space where objects from a specific class are categorized into one category however, it is insensitive to intra-class variations. Furthermore, we use the NullSpaceNet backbone to learn a tracker, dubbed NullSpaceRDAR, with a regularized discriminative joint-nullspace backbone network that is specifically designed for object tracking. In the regularized discriminative joint-nullspace, the features from the same target-specific are collapsed into one point in the joint-null space and different targetspecific features are collapsed into different points in the joint-nullspace. Consequently, the joint-nullspace forces the network to be sensitive to the variations of the object from the same class (intra-class variations). Moreover, a dynamic adaptive loss function is proposed to select the suitable loss function from a super-set family of losses based on the training data to make NullSpaceRDAR more robust to different challenges.
Visual Object Tracking

This book delves into visual object tracking (VOT), a fundamental aspect of computer vision crucial for replicating human dynamic vision, with applications ranging from self-driving vehicles to surveillance systems. Despite significant strides propelled by deep learning, challenges such as target deformation and motion persist, exposing a disparity between cutting-edge VOT systems and human performance. This observation underscores the necessity to thoroughly scrutinize and enhance evaluation methodologies within VOT research. Hence, the primary objective of this book is to equip readers with essential insights into dynamic visual tasks encapsulated by VOT. Beginning with the elucidation of task definitions, it integrates interdisciplinary perspectives on evaluation techniques. The book is organized into five parts, tracing the evolution of VOT from perceptual to cognitive intelligence, exploring the experimental frameworks utilized in assessments, analyzing the various agents involved, including tracking algorithms and human visual tracking, and dissecting evaluation mechanisms through both machine–machine and human–machine comparisons. Furthermore, it examines the trend toward crafting more human-like task definitions and comprehensive evaluation frameworks to effectively gauge machine intelligence. This book serves as a roadmap for researchers aiming to grasp the bottlenecks in VOT capabilities and comprehend the gaps between current methodologies and human abilities, all geared toward advancing algorithmic intelligence. It also delves into the realm of data-centric AI, emphasizing the pivotal role of high-quality datasets and evaluation systems in the age of large language models (LLMs). Such systems are indispensable for training AI models while ensuring their safety and reliability. Utilizing VOT as a case study, the book offers detailed insights into these facets of data-centric AI research. Designed to cater to readers with foundational knowledge in computer vision, it employs diagrams and examples to facilitate comprehension, providing essential groundwork for understanding key technical components.
Computer Vision -- ECCV 2010

Author: Kostas Daniilidis
language: en
Publisher: Springer Science & Business Media
Release Date: 2010-08-30
The six-volume set comprising LNCS volumes 6311 until 6313 constitutes the refereed proceedings of the 11th European Conference on Computer Vision, ECCV 2010, held in Heraklion, Crete, Greece, in September 2010. The 325 revised papers presented were carefully reviewed and selected from 1174 submissions. The papers are organized in topical sections on object and scene recognition; segmentation and grouping; face, gesture, biometrics; motion and tracking; statistical models and visual learning; matching, registration, alignment; computational imaging; multi-view geometry; image features; video and event characterization; shape representation and recognition; stereo; reflectance, illumination, color; medical image analysis.