English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Conference Paper

Automatic rating of hoarseness by text-based cepstral and prosodic evaluation

MPS-Authors
/persons/resource/persons71684

Moers,  Cornelia
International Max Planck Research School for Language Sciences, MPI for Psycholinguistics, Max Planck Society, Nijmegen, NL;
Psychology of Language Department, MPI for Psycholinguistics, Max Planck Society;
University of Bonn, Department of Speech and Communication,Bonn, Germany;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)

Haderlein_Moers_2012.pdf
(Publisher version), 156KB

Supplementary Material (public)
There is no public supplementary material available
Citation

Haderlein, T., Moers, C., Möbius, B., & Nöth, E. (2012). Automatic rating of hoarseness by text-based cepstral and prosodic evaluation. In P. Sojka, A. Horák, I. Kopecek, & K. Pala (Eds.), Proceedings of the 15th International Conference on Text, Speech and Dialogue (TSD 2012) (pp. 573-580). Heidelberg: Springer.


Cite as: https://hdl.handle.net/11858/00-001M-0000-000F-EB7A-4
Abstract
The standard for the analysis of distorted voices is perceptual rating of read-out texts or spontaneous speech. Automatic voice evaluation, however, is usually done on stable sections of sustained vowels. In this paper, text-based and established vowel-based analysis are compared with respect to their ability to measure hoarseness and its subclasses. 73 hoarse patients (48.3±16.8 years) uttered the vowel /e/ and read the German version of the text “The North Wind and the Sun”. Five speech therapists and physicians rated roughness, breathiness, and hoarseness according to the German RBH evaluation scheme. The best human-machine correlations were obtained for measures based on the Cepstral Peak Prominence (CPP; up to |r | = 0.73). Support Vector Regression (SVR) on CPP-based measures and prosodic features improved the results further to r ≈0.8 and confirmed that automatic voice evaluation should be performed on a text recording.