Talker diarization in the wild: The case of child-centered daylong 
audio-recordings

Cristia, Alejandrina; Ganesh, Shobhana; Casillas, Marisa; Ganapathy, Sriram

doi:10.21437/Interspeech.2018-2078

Local TagsRelease HistoryDetailsSummary

Talker diarization in the wild: The case of child-centered daylong audio-recordings

Cristia, A., Ganesh, S., Casillas, M., & Ganapathy, S. (2018). Talker diarization in the wild: The case of child-centered daylong audio-recordings. In Proceedings of Interspeech 2018 (pp. 2583-2587). doi:10.21437/Interspeech.2018-2078.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/21.11116/0000-0001-9498-C Version Permalink: https://hdl.handle.net/21.11116/0000-0002-5731-5

Genre: Conference Paper

Files

show Files

hide Files

:

Cristia_etal_2018.pdf (Publisher version), 3MB

View Save

File Permalink:
https://hdl.handle.net/21.11116/0000-0002-5730-6

Name:
Cristia_etal_2018.pdf

Description:
-

OA-Status:

Visibility:
Public

MIME-Type / Checksum:
application/pdf / [MD5]

Technical Metadata:

View

Copyright Date:
-

Copyright Info:
-

License:
-

Locators

show

Creators

show

hide

Creators:
Cristia, Alejandrina¹, Author
Ganesh, Shobhana², Author
Casillas, Marisa³, Author
Ganapathy, Sriram², Author

Affiliations:
1LSCP, Département d’études cognitives, (ENS, EHESS, CNRS, PSL University), Paris, 75005, France, ou_persistent22
2Learning and Extraction of Acoustic Patterns (LEAP) lab, Electrical Engineering, Indian Institute of Science, Bangalore, 560012, India, ou_persistent22
3Language Development Department, MPI for Psycholinguistics, Max Planck Society, ou_2340691

Content

show

hide

Free keywords: speaker diarization, language acquisition, spontaneous speech, i-vectors

Abstract: Speaker diarization (answering 'who spoke when') is a widely researched subject within speech technology. Numerous experiments have been run on datasets built from broadcast news, meeting data, and call centers—the task sometimes appears close to being solved. Much less work has begun to tackle the hardest diarization task of all: spontaneous conversations in real-world settings. Such diarization would be particularly useful for studies of language acquisition, where researchers investigate the speech children produce and hear in their daily lives. In this paper, we study audio gathered with a recorder worn by small children as they went about their normal days. As a result, each child was exposed to different acoustic environments with a multitude of background noises and a varying number of adults and peers. The inconsistency of speech and noise within and across samples poses a challenging task for speaker diarization systems, which we tackled via retraining and data augmentation techniques. We further studied sources of structured variation across raw audio files, including the impact of speaker type distribution, proportion of speech from children, and child age on diarization performance. We discuss the extent to which these findings might generalize to other samples of speech in the wild.

Details

show

hide

Language(s): eng - English

Dates: Submitted: 2018-03-26Accepted: 2018-06-03Published Online: 2018-10

Publication Status: Published online

Pages: -

Publishing info: -

Table of Contents: -

Rev. Type: Peer

Identifiers: DOI: 10.21437/Interspeech.2018-2078

Degree: -

Event

show

hide

Title: Interspeech 2018

Place of Event: Hyderabad, India

Start-/End Date: 2018-09-02 - 2018-09-05

Legal Case

show

Project information

show

Source 1

show

hide

Title: Proceedings of Interspeech 2018

Source Genre: Proceedings

Creator(s):

Affiliations:

Publ. Info: -

Pages: - Volume / Issue: - Sequence Number: - Start / End Page: 2583 - 2587 Identifier: -