English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
 
 
DownloadE-Mail
  Speaker diarization using gesture and speech

Gebre, B. G., Wittenburg, P., Drude, S., Huijbregts, M., & Heskes, T. (2014). Speaker diarization using gesture and speech. In H. Li, & P. Ching (Eds.), Proceedings of Interspeech 2014: 15th Annual Conference of the International Speech Communication Association (pp. 582-586).

Item is

Files

show Files
hide Files
:
interspeech_paper.pdf (Preprint), 950KB
Name:
interspeech_paper.pdf
Description:
-
OA-Status:
Visibility:
Public
MIME-Type / Checksum:
application/pdf / [MD5]
Technical Metadata:
Copyright Date:
-
Copyright Info:
-
License:
-

Locators

show

Creators

show
hide
 Creators:
Gebre, Binyam Gebrekidan1, Author           
Wittenburg, Peter1, Author           
Drude, Sebastian1, Author           
Huijbregts, Marijn2, Author
Heskes, Tom2, Author
Affiliations:
1The Language Archive, MPI for Psycholinguistics, Max Planck Society, ou_530892              
2Radboud University, ou_persistent22              

Content

show
hide
Free keywords: speaker diarization, gestures, speaker recognition, gaussian mixture models, motion history images
 Abstract: We demonstrate how the problem of speaker diarization can be solved using both gesture and speaker parametric models. The novelty of our solution is that we approach the speaker diarization problem as a speaker recognition problem after learning speaker models from speech samples corresponding to gestures (the occurrence of gestures indicates the presence of speech and the location of gestures indicates the identity of the speaker). This new approach offers many advantages: comparable state-of-the-art performance, faster computation and more adaptability. In our implementation, parametric models are used to model speakers' voice and their gestures: more specifically, Gaussian mixture models are used to model the voice characteristics of each person and all persons, and gamma distributions are used to model gestural activity based on features extracted from Motion History Images. Tests on 4.24 hours of the AMI meeting data show that our solution makes DER score improvements of 19% on speech-only segments and 4% on all segments including silence (the comparison is with the AMI system).

Details

show
hide
Language(s): eng - English
 Dates: 2014-06-102014
 Publication Status: Published online
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: -
 Degree: -

Event

show
hide
Title: Interspeech 2014: 15th Annual Conference of the International Speech Communication Association
Place of Event: Singapore
Start-/End Date: 2014-09-14 - 2014-09-18

Legal Case

show

Project information

show

Source 1

show
hide
Title: Proceedings of Interspeech 2014: 15th Annual Conference of the International Speech Communication Association
Source Genre: Proceedings
 Creator(s):
Li, H., Editor
Ching, P., Editor
Affiliations:
-
Publ. Info: -
Pages: - Volume / Issue: - Sequence Number: - Start / End Page: 582 - 586 Identifier: -