English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Conference Paper

Combining appearance and motion for human action classification in videos

MPS-Authors
/persons/resource/persons84113

Nowozin,  S
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

/persons/resource/persons84037

Lampert,  C
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available
Citation

Dhillon, P., Nowozin, S., & Lampert, C. (2009). Combining appearance and motion for human action classification in videos. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (pp. 22-29). Piscataway, NJ, USA: IEEE Service Center.


Cite as: https://hdl.handle.net/11858/00-001M-0000-0013-C473-3
Abstract
An important cue to high level scene understanding is to analyze the objects in the scene and their behavior and interactions. In this paper, we study the problem of classification of activities in videos, as this is an integral component of any scene understanding system, and present a novel approach for recognizing human action categories in videos by combining information from appearance and motion of human body parts. Our approach is based on tracking human body parts by using mixture particle filters and then clustering the particles using local non - parametric clustering, hence associating a local set of particles to each cluster mode. The trajectory of these cluster modes provides the “motion” information and the “appearance” information is provided by the statistical information about the relative motion of these local set of particles over a number of frames. Later we use a “Bag of Words” model to build one histogram per video sequence from the set of these robust appearance and motion descriptors. These histograms provide us characteristic information which helps us to discriminate among various human actions which ultimately helps us in better understanding of the complete scene. We tested our approach on the standard KTH and Weizmann human action datasets and the results were comparable to the state of the art methods. Additionally our approach is able to distinguish between activities that involve the motion of complete body from those in which only certain body parts move. In other words, our method discriminates well between activities with “global body motion” like running, jogging etc. and “local motion” like waving, boxing etc.