Spatio-Temporal Feature Extraction for Action Recognition in Videos

Mahler, L

doi:10.13140/RG.2.2.36624.23046

Local TagsRelease HistoryDetailsSummary

Spatio-Temporal Feature Extraction for Action Recognition in Videos

Mahler, L. (2020). Spatio-Temporal Feature Extraction for Action Recognition in Videos. Bachelor Thesis, Technische Hochschule Ulm, Ulm, Germany. doi:10.13140/RG.2.2.36624.23046.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/21.11116/0000-000D-F94D-5 Version Permalink: https://hdl.handle.net/21.11116/0000-000D-F94E-4

Genre: Thesis

Files

show Files

Locators

show

Creators

show

hide

Creators:
Mahler, L¹, Author

Affiliations:
1External Organizations, ou_persistent22

Content

show

hide

Free keywords: -

Abstract: Autonomous robots and vehicles that primarily act in environments that inhabit humans require the recognition of human action from incoming sensory video data in order to form a complete scene understanding. This scene understanding is necessary for all high level functionality of the autonomous machine. Motivated by this, this work reviews several methods for human action recognition and detection in videos. To recognize or detect actions, the extraction of spatio-temporal features is a prerequisite. Good performance of 2D CNNs in the domain of action recognition has proven that the spatial dimension contains indications for the present actions. 3D CNNs that symmetrical convolve the spatial and temporal dimension could improve the performance of 2D CNNs only moderately, giving hints that the temporal dimension requires special treatment. This thesis took inspiration from visual processing in mammalian brains to select a very deep two stream architecture, SlowFast, where the two streams represent the spatial and temporal dimension respectively. Furthermore, this thesis empirically demonstrates that the chosen SlowFast architecture has exceptional spatio-temporal modeling capabilities. This is supported by the high parameter utilization of SlowFast. Low runtime cost, high throughput and state-of-the-art accuracy underline the representational power of the architecture. Moreover, the methodical analysis reveals that hierarchical processing in SlowFast is a significant contributor to its performance. The two streams achieve high functional specialization for the tasks of modeling motion and form modeling with the same underlying computational principles. Channel capacity and temporal resolution are shown to have high responsibility in achieving said functional specialization.

Details

show

hide

Language(s):

Dates: Published Online: 2020-09-28Date issued: 2020

Publication Status: Issued

Pages: -

Publishing info: Ulm, Germany : Technische Hochschule Ulm

Table of Contents: -

Rev. Type: -

Identifiers: DOI: 10.13140/RG.2.2.36624.23046

Degree: Bachelor

Event

show

Legal Case

show

Project information

show

Source

show