English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  Spatio-Temporal Feature Extraction for Action Recognition in Videos

Mahler, L. (2020). Spatio-Temporal Feature Extraction for Action Recognition in Videos. Bachelor Thesis, Technische Hochschule Ulm, Ulm, Germany. doi:10.13140/RG.2.2.36624.23046.

Item is

Files

show Files

Locators

show

Creators

show
hide
 Creators:
Mahler, L1, Author           
Affiliations:
1External Organizations, ou_persistent22              

Content

show
hide
Free keywords: -
 Abstract: Autonomous robots and vehicles that primarily act in environments that inhabit humans require the recognition of human action from incoming sensory video data in order to form a complete scene understanding. This scene understanding is necessary for all high level functionality of the autonomous machine. Motivated by this, this work reviews several methods for human action recognition and detection in videos. To recognize or detect actions, the extraction of spatio-temporal features is a prerequisite. Good performance of 2D CNNs in the domain of action recognition has proven that the spatial dimension contains indications for the present actions. 3D CNNs that symmetrical convolve the spatial and temporal dimension could improve the performance of 2D CNNs only moderately, giving hints that the temporal dimension requires special treatment. This thesis took inspiration from visual processing in mammalian brains to select a very deep two stream architecture, SlowFast, where the two streams represent the spatial and temporal dimension respectively. Furthermore, this thesis empirically demonstrates that the chosen SlowFast architecture has exceptional spatio-temporal modeling capabilities. This is supported by the high parameter utilization of SlowFast. Low runtime cost, high throughput and state-of-the-art accuracy underline the representational power of the architecture. Moreover, the methodical analysis reveals that hierarchical processing in SlowFast is a significant contributor to its performance. The two streams achieve high functional specialization for the tasks of modeling motion and form modeling with the same underlying computational principles. Channel capacity and temporal resolution are shown to have high responsibility in achieving said functional specialization.

Details

show
hide
Language(s):
 Dates: 2020-09-282020
 Publication Status: Issued
 Pages: -
 Publishing info: Ulm, Germany : Technische Hochschule Ulm
 Table of Contents: -
 Rev. Type: -
 Identifiers: DOI: 10.13140/RG.2.2.36624.23046
 Degree: Bachelor

Event

show

Legal Case

show

Project information

show

Source

show