Speaker diarization using gesture and speech

Gebre, Binyam Gebrekidan; Wittenburg, Peter; Drude, Sebastian; Huijbregts, Marijn; Heskes, Tom

Local TagsRelease HistoryDetailsSummary

Speaker diarization using gesture and speech

Gebre, B. G., Wittenburg, P., Drude, S., Huijbregts, M., & Heskes, T. (2014). Speaker diarization using gesture and speech. In H. Li, & P. Ching (Eds.), Proceedings of Interspeech 2014: 15th Annual Conference of the International Speech Communication Association (pp. 582-586).

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/11858/00-001M-0000-0019-B65B-7 Version Permalink: https://hdl.handle.net/21.11116/0000-0004-A115-E

Genre: Conference Paper

Files

show Files

hide Files

:

interspeech_paper.pdf (Preprint), 950KB

View Save

File Permalink:
https://hdl.handle.net/11858/00-001M-0000-0019-B65D-3

Name:
interspeech_paper.pdf

Description:
-

OA-Status:

Visibility:
Public

MIME-Type / Checksum:
application/pdf / [MD5]

Technical Metadata:

View

Copyright Date:
-

Copyright Info:
-

License:
-

Locators

show

Creators

show

hide

Creators:
Gebre, Binyam Gebrekidan¹, Author
Wittenburg, Peter¹, Author
Drude, Sebastian¹, Author
Huijbregts, Marijn², Author
Heskes, Tom², Author

Affiliations:
1The Language Archive, MPI for Psycholinguistics, Max Planck Society, ou_530892
2Radboud University, ou_persistent22

Content

show

hide

Free keywords: speaker diarization, gestures, speaker recognition, gaussian mixture models, motion history images

Abstract: We demonstrate how the problem of speaker diarization can be solved using both gesture and speaker parametric models. The novelty of our solution is that we approach the speaker diarization problem as a speaker recognition problem after learning speaker models from speech samples corresponding to gestures (the occurrence of gestures indicates the presence of speech and the location of gestures indicates the identity of the speaker). This new approach offers many advantages: comparable state-of-the-art performance, faster computation and more adaptability. In our implementation, parametric models are used to model speakers' voice and their gestures: more specifically, Gaussian mixture models are used to model the voice characteristics of each person and all persons, and gamma distributions are used to model gestural activity based on features extracted from Motion History Images. Tests on 4.24 hours of the AMI meeting data show that our solution makes DER score improvements of 19% on speech-only segments and 4% on all segments including silence (the comparison is with the AMI system).

Details

show

hide

Language(s): eng - English

Dates: Accepted: 2014-06-10Published Online: 2014

Publication Status: Published online

Pages: -

Publishing info: -

Table of Contents: -

Rev. Type: -

Identifiers: -

Degree: -

Event

show

hide

Title: Interspeech 2014: 15th Annual Conference of the International Speech Communication Association

Place of Event: Singapore

Start-/End Date: 2014-09-14 - 2014-09-18

Legal Case

show

Project information

show

Source 1

show

hide

Title: Proceedings of Interspeech 2014: 15th Annual Conference of the International Speech Communication Association

Source Genre: Proceedings

Creator(s):
Li, H., Editor
Ching, P., Editor

Affiliations:
-

Publ. Info: -

Pages: - Volume / Issue: - Sequence Number: - Start / End Page: 582 - 586 Identifier: -