Learning Data-Driven Representations for Robust Monocular Computer Vision 
Applications

Herdtweck, C

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Thesis

Learning Data-Driven Representations for Robust Monocular Computer Vision Applications

MPS-Authors

/persons/resource/persons83965

Herdtweck, C
Department Human Perception, Cognition and Action, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;
Project group: Cognitive Engineering, Max Planck Institute for Biological Cybernetics, Max Planck Society;

External Resource

https://publikationen.uni-tuebingen.de/xmlui/handle/10900/50017
(Publisher version)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Herdtweck, C. (2013). Learning Data-Driven Representations for Robust Monocular Computer Vision Applications. PhD Thesis, Eberhard-Karls-Universität, Tübingen, Germany.

Cite as: https://hdl.handle.net/11858/00-001M-0000-001A-1377-9

Abstract

For computer vision applications, one crucial step is the choice of a suitable representation of image data. Learning such representations from observed data using machine learning methods has allowed computer vision applications to be applied in a wider range of every-day scenarios. Three new representations for applications using data from a single camera are presented in this work together with algorithms for learning these from training data. The first two representations are applied to image sequences taken by a single camera located in a moving vehicle. By calculating optical flow and representing the resulting vector field as point in a learned linear subspace greatly simplifies the interpretation of the flow. It allows not only to estimate the vehicle's self-motion by means of a learned linear mapping, but also to identify independently moving objects, wrong flow vectors, and cope with missing vectors in homogeneous image regions. The second representation uses work in object detection and circular statistics to estimate the orientation of observed objects. Orientation knowledge is represented as a multi-modal probability distribution in a circular space, which allows to capture ambiguities in the mapping from appearance to orientation. This ambiguity can be resolved in further processing steps, the use of a particle filter for temporal integration and consistent orientation tracking is presented. Extending the filtering framework to include object position, orientation, speed and front wheel angle, results show improved tracking of other vehicles observed by a moving camera. The third new representation aims at capturing the gist of an image, mimicking the first stages of human visual processing. Having formed after only a few hundred milliseconds, this gist forms the basis for further visual processing. By combining algorithms for surface orientation estimation, object detection, scene type classification and viewpoint estimation with general knowledge in an iterative fashion, the proposed algorithm tries to form a consistent, general-purpose representation of a single image. In several psychophysical experiments, it is shown that the horizon is part of this visual gist in humans and that several queues are important for its estimation by human and machine.