Automatic word count estimation from daylong child-centered recordings in 
various language environments using language-independent syllabification of 
speech

Räsänen, Okko; Seshadri, Shreyas; Karadayi, Julien; Riebling, Eric; Bunce, John; Cristia, Alejandrina; Metze, Florian; Casillas, Marisa; Rosemberg, Celia; Bergelson, Elika; Soderstrom, Melanie

doi:10.1016/j.specom.2019.08.005

Local TagsRelease HistoryDetailsSummary

Automatic word count estimation from daylong child-centered recordings in various language environments using language-independent syllabification of speech

Räsänen, O., Seshadri, S., Karadayi, J., Riebling, E., Bunce, J., Cristia, A., et al. (2019). Automatic word count estimation from daylong child-centered recordings in various language environments using language-independent syllabification of speech. Speech Communication, 113, 63-80. doi:10.1016/j.specom.2019.08.005.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/21.11116/0000-0004-7D6F-5 Version Permalink: https://hdl.handle.net/21.11116/0000-0005-63B9-B

Genre: Journal Article

Files

show Files

hide Files

:

Rasanen_etal_2019_Automatic word count....pdf (Publisher version), 3MB

View Save

File Permalink:
https://hdl.handle.net/21.11116/0000-0005-63B8-C

Name:
Rasanen_etal_2019_Automatic word count....pdf

Description:
-

OA-Status:

Visibility:
Public

MIME-Type / Checksum:
application/pdf / [MD5]

Technical Metadata:

View

Copyright Date:
2019

Copyright Info:
2019 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license

License:
https://creativecommons.org/licenses/by-nc-nd/4.0/

Locators

show

Creators

show

hide

Creators:
Räsänen, Okko ^{1, 2}, Author
Seshadri, Shreyas², Author
Karadayi, Julien³, Author
Riebling, Eric⁴, Author
Bunce, John⁵, Author
Cristia, Alejandrina³, Author
Metze, Florian⁴, Author
Casillas, Marisa⁶, Author
Rosemberg, Celia⁷, Author
Bergelson, Elika⁸, Author
Soderstrom, Melanie⁵, Author

Affiliations:
1Unit of Computing Sciences, Tampere University, Tampere, Finland, ou_persistent22
2Department of Signal Processing and Acoustics, Aalto University, Aalto, Finland, ou_persistent22
3Laboratoire de Sciences Cognitives et Psycholinguistique, Dept d'Etudes Cognitives, ENS, PSL University, EHESS, CNRS, Paris, France, ou_persistent22
4Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, USA, ou_persistent22
5Department of Psychology, University of Manitoba, Manitoba, Canada, ou_persistent22
6Language Development Department, MPI for Psycholinguistics, Max Planck Society, ou_2340691
7Centro Interdisciplinario de Investigaciones en Psicología Matemática y Experimental, CONICET, ou_persistent22
8Department of Psychology and Neuroscience, Duke University, Durham, NC, USA, ou_persistent22

Content

show

hide

Free keywords: -

Abstract: Automatic word count estimation (WCE) from audio recordings can be used to quantify the amount of verbal communication in a recording environment. One key application of WCE is to measure language input heard by infants and toddlers in their natural environments, as captured by daylong recordings from microphones worn by the infants. Although WCE is nearly trivial for high-quality signals in high-resource languages, daylong recordings are substantially more challenging due to the unconstrained acoustic environments and the presence of near- and far-field speech. Moreover, many use cases of interest involve languages for which reliable ASR systems or even well-defined lexicons are not available. A good WCE system should also perform similarly for low- and high-resource languages in order to enable unbiased comparisons across different cultures and environments. Unfortunately, the current state-of-the-art solution, the LENA system, is based on proprietary software and has only been optimized for American English, limiting its applicability. In this paper, we build on existing work on WCE and present the steps we have taken towards a freely available system for WCE that can be adapted to different languages or dialects with a limited amount of orthographically transcribed speech data. Our system is based on language-independent syllabification of speech, followed by a language-dependent mapping from syllable counts (and a number of other acoustic features) to the corresponding word count estimates. We evaluate our system on samples from daylong infant recordings from six different corpora consisting of several languages and socioeconomic environments, all manually annotated with the same protocol to allow direct comparison. We compare a number of alternative techniques for the two key components in our system: speech activity detection and automatic syllabification of speech. As a result, we show that our system can reach relatively consistent WCE accuracy across multiple corpora and languages (with some limitations). In addition, the system outperforms LENA on three of the four corpora consisting of different varieties of English. We also demonstrate how an automatic neural network-based syllabifier, when trained on multiple languages, generalizes well to novel languages beyond the training data, outperforming two previously proposed unsupervised syllabifiers as a feature extractor for WCE.

Details

show

hide

Language(s): eng - English

Dates: Published Online: 2019-08-14Date issued: 2019-12

Publication Status: Issued

Pages: -

Publishing info: -

Table of Contents: -

Rev. Type: Peer

Identifiers: DOI: 10.1016/j.specom.2019.08.005

Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show

hide

Title: Speech Communication

Other : Speech Commun.

Source Genre: Journal

Creator(s):

Affiliations:

Publ. Info: Amsterdam, Netherlands : Elsevier

Pages: - Volume / Issue: 113 Sequence Number: - Start / End Page: 63 - 80 Identifier: ISSN: 0167-6393
CoNE: https://pure.mpg.de/cone/journals/resource/954925483662