Benutzerhandbuch Datenschutzhinweis Impressum Kontakt





Automatic rating of hoarseness by text-based cepstral and prosodic evaluation


Moers,  Cornelia
International Max Planck Research School for Language Sciences, MPI for Psycholinguistics, Max Planck Society, Nijmegen, NL;
Psychology of Language Department, MPI for Psycholinguistics, Max Planck Society;
University of Bonn, Department of Speech and Communication,Bonn, Germany;

Externe Ressourcen
Es sind keine Externen Ressourcen verfügbar
Volltexte (frei zugänglich)

(Verlagsversion), 156KB

Ergänzendes Material (frei zugänglich)
Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar

Haderlein, T., Moers, C., Möbius, B., & Nöth, E. (2012). Automatic rating of hoarseness by text-based cepstral and prosodic evaluation. In P. Sojka, A. Horák, I. Kopecek, & K. Pala (Eds.), Proceedings of the 15th International Conference on Text, Speech and Dialogue (TSD 2012) (pp. 573-580). Heidelberg: Springer.

The standard for the analysis of distorted voices is perceptual rating of read-out texts or spontaneous speech. Automatic voice evaluation, however, is usually done on stable sections of sustained vowels. In this paper, text-based and established vowel-based analysis are compared with respect to their ability to measure hoarseness and its subclasses. 73 hoarse patients (48.3±16.8 years) uttered the vowel /e/ and read the German version of the text “The North Wind and the Sun”. Five speech therapists and physicians rated roughness, breathiness, and hoarseness according to the German RBH evaluation scheme. The best human-machine correlations were obtained for measures based on the Cepstral Peak Prominence (CPP; up to |r | = 0.73). Support Vector Regression (SVR) on CPP-based measures and prosodic features improved the results further to r ≈0.8 and confirmed that automatic voice evaluation should be performed on a text recording.