English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Conference Paper

Motion history images for online speaker/signer diarization

MPS-Authors
/persons/resource/persons4454

Gebre,  Binyam Gebrekidan
The Language Archive, MPI for Psycholinguistics, Max Planck Society;

/persons/resource/persons216

Wittenburg,  Peter
The Language Archive, MPI for Psycholinguistics, Max Planck Society;

/persons/resource/persons39383

Drude,  Sebastian
The Language Archive, MPI for Psycholinguistics, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available
Citation

Gebre, B. G., Wittenburg, P., Heskes, T., & Drude, S. (2014). Motion history images for online speaker/signer diarization. In Proceedings of the 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 1537-1541). Piscataway, NJ: IEEE.


Cite as: https://hdl.handle.net/11858/00-001M-0000-0015-1AC4-2
Abstract
We present a solution to the problem of online speaker/signer diarization - the task of determining "who spoke/signed when?". Our solution is based on the idea that gestural activity (hands and body movement) is highly correlated with uttering activity. This correlation is necessarily true for sign languages and mostly true for spoken languages. The novel part of our solution is the use of motion history images (MHI) as a likelihood measure for probabilistically detecting uttering activities. MHI is an efficient representation of where and how motion occurred for a fixed period of time. We conducted experiments on 4.9 hours of a publicly available dataset (the AMI meeting data) and 1.4 hours of sign language dataset (Kata Kolok data). The best performance obtained is 15.70% for sign language and 31.90% for spoken language (measurements are in DER). These results show that our solution is applicable in real-world applications like video conferences.