English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  Comparison of syllabification algorithms and training strategies for robust word count estimation across different languages and recording conditions

Räsänen, O., Seshadri, S., & Casillas, M. (2018). Comparison of syllabification algorithms and training strategies for robust word count estimation across different languages and recording conditions. In Proceedings of Interspeech 2018 (pp. 1200-1204). doi:10.21437/Interspeech.2018-1047.

Item is

Basic

show hide
Genre: Conference Paper

Files

show Files
hide Files
:
Räsänen_Seshadri_Casillas_2018.pdf (Publisher version), 980KB
Name:
Räsänen_Seshadri_Casillas_2018.pdf
Description:
-
OA-Status:
Visibility:
Public
MIME-Type / Checksum:
application/pdf / [MD5]
Technical Metadata:
Copyright Date:
-
Copyright Info:
-
License:
-

Locators

show

Creators

show
hide
 Creators:
Räsänen, Okko1, Author
Seshadri, Shreyas1, Author
Casillas, Marisa2, Author           
Affiliations:
1Department of Signal Processing and Acoustics, Aalto University, Finland, ou_persistent22              
2Language Development Department, MPI for Psycholinguistics, Max Planck Society, ou_2340691              

Content

show
hide
Free keywords: language acquisition, syllabification, word count estimation, daylong recordings, noise robustness
 Abstract: Word count estimation (WCE) from audio recordings has a number of applications, including quantifying the amount of speech that language-learning infants hear in their natural environments, as captured by daylong recordings made with devices worn by infants. To be applicable in a wide range of scenarios and also low-resource domains, WCE tools should be extremely robust against varying signal conditions and require minimal access to labeled training data in the target domain. For this purpose, earlier work has used automatic syllabification of speech, followed by a least-squares-mapping of syllables to word counts. This paper compares a number of previously proposed syllabifiers in the WCE task, including a supervised bi-directional long short-term memory (BLSTM) network that is trained on a language for which high quality syllable annotations are available (a “high resource language”), and reports how the alternative methods compare on different languages and signal conditions. We also explore additive noise and varying-channel data augmentation strategies for BLSTM training, and show how they improve performance in both matching and mismatching languages. Intriguingly, we also find that even though the BLSTM works on languages beyond its training data, the unsupervised algorithms can still outperform it in challenging signal conditions on novel languages.

Details

show
hide
Language(s): eng - English
 Dates: 2018-03-262018-06-032018-10
 Publication Status: Published online
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: Peer
 Identifiers: DOI: 10.21437/Interspeech.2018-1047
 Degree: -

Event

show
hide
Title: Interspeech 2018
Place of Event: Hyderabad, India
Start-/End Date: 2018-09-02 - 2018-09-06

Legal Case

show

Project information

show

Source 1

show
hide
Title: Proceedings of Interspeech 2018
Source Genre: Proceedings
 Creator(s):
Affiliations:
Publ. Info: -
Pages: - Volume / Issue: - Sequence Number: - Start / End Page: 1200 - 1204 Identifier: -