Unsupervised speech segmentation: An analysis of the hypothesized phone 
boundaries

Scharenborg, Odette; Wan, Vincent; Ernestus, Mirjam

doi:10.1121/1.3277194

Local TagsRelease HistoryDetailsSummary

Unsupervised speech segmentation: An analysis of the hypothesized phone boundaries

Scharenborg, O., Wan, V., & Ernestus, M. (2010). Unsupervised speech segmentation: An analysis of the hypothesized phone boundaries. Journal of the Acoustical Society of America, 127, 1084-1095. doi:10.1121/1.3277194.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/11858/00-001M-0000-0012-65BE-E Version Permalink: https://hdl.handle.net/11858/00-001M-0000-0012-65C1-4

Genre: Journal Article

Files

show Files

hide Files

:

Scharenborg_Unsupervised_speech_segmentation_JASA_2010.pdf (Publisher version), 314KB

View Save

File Permalink:
https://hdl.handle.net/11858/00-001M-0000-0012-65BD-0

Name:
Scharenborg_Unsupervised_speech_segmentation_JASA_2010.pdf

Description:
-

OA-Status:

Visibility:
Public

MIME-Type / Checksum:
application/pdf / [MD5]

Technical Metadata:

View

Copyright Date:
-

Copyright Info:
-

License:
-

Locators

show

Creators

show

hide

Creators:
Scharenborg, Odette¹, Author
Wan, Vincent², Author
Ernestus, Mirjam^{3, 4}, Author

Affiliations:
1Centre for Language and Speech Technology, Radboud University Nijmegen, The Netherlands, ou_persistent22
2Department of Computer Science, Speech and Hearing Research Group, University of Sheffield, United Kingdom, ou_persistent22
3Center for Language Studies, External organization, ou_55238
4Language Comprehension Group, MPI for Psycholinguistics, Max Planck Society, Nijmegen, NL, ou_55203

Content

show

hide

Free keywords: -

Abstract: Despite using different algorithms, most unsupervised automatic phone segmentation methods achieve similar performance in terms of percentage correct boundary detection. Nevertheless, unsupervised segmentation algorithms are not able to perfectly reproduce manually obtained reference transcriptions. This paper investigates fundamental problems for unsupervised segmentation algorithms by comparing a phone segmentation obtained using only the acoustic information present in the signal with a reference segmentation created by human transcribers. The analyses of the output of an unsupervised speech segmentation method that uses acoustic change to hypothesize boundaries showed that acoustic change is a fairly good indicator of segment boundaries: over two-thirds of the hypothesized boundaries coincide with segment boundaries. Statistical analyses showed that the errors are related to segment duration, sequences of similar segments, and inherently dynamic phones. In order to improve unsupervised automatic speech segmentation, current one-stage bottom-up segmentation methods should be expanded into two-stage segmentation methods that are able to use a mix of bottom-up information extracted from the speech signal and automatically derived top-down information. In this way, unsupervised methods can be improved while remaining flexible and language-independent.

Details

show

hide

Language(s):

Dates: Date issued: 2010

Publication Status: Issued

Pages: -

Publishing info: -

Table of Contents: -

Rev. Type: Peer

Identifiers: DOI: 10.1121/1.3277194

Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show

hide

Title: Journal of the Acoustical Society of America

Source Genre: Journal

Creator(s):

Affiliations:

Publ. Info: New York, etc. : American Institute of Physics for the Acoustical Society of America.

Pages: - Volume / Issue: 127 Sequence Number: - Start / End Page: 1084 - 1095 Identifier: Other: 110975506069643
Other: 0001-4966