Help Privacy Policy Disclaimer
  Advanced SearchBrowse





Speech rhythm measure of non-native speech using a statistical phoneme duration model

There are no MPG-Authors in the publication available
External Resource
No external resources are shared
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available

Hiroya, S., Jasmin, K., Krishnan, S., Lima, C., Ostarek, M., Boebinger, D., et al. (2016). Speech rhythm measure of non-native speech using a statistical phoneme duration model. Poster presented at the 8th Annual Meeting of the Society for the Neurobiology of Language, London, UK.

Cite as: http://hdl.handle.net/11858/00-001M-0000-002B-9CF2-5
We normally understand speech in our native language without effort. Recent brain imaging studies revealed a common cortical activation in left-lateralized motor area for speech production and perception. Moreover, the activity was increased by listening to speech sounds with less natural frequency information such as sinewave speech and noise-vocoded speech. Rhythm is a natural part of speech. There is a difference between a mora-timed rhythm like Japanese and a stress- timed rhythm like English. A native Japanese speaker tends to apply mora-timed rhythm to English. However, few studied have investigated the neural mechanisms of the processing of speech rhythm during speech perception. We developed a method for decomposing speech signals into speech rhythm and frequency information. English speech sounds spoken by a native Japanese speaker were manipulated such that their rhythm was stress-timed like English and more-timed like Japanese. Stress-timed rhythm was obtained from a native British English speakers’ speech. Noise-vocoding was used to minimize contributions of F0 and to control intelligibility across conditions. Twenty-one healthy right-handed native English speakers were participated. FMRI was used to image the brains of participants while they listened to the sentences. Result showed that left-lateralized supplementary motor area (SMA), a region involved in speech production, was more activated for mora-timed rhythm (non-native rhythm) than stress-timed rhythm. This suggests that integrating non-native speech rhythm with native language speech may rely on increased auditory-motor processing. In behavioral testing, native English speakers judged the naturalness of speaking rhythm of the sentences. Results confirmed participants judged English rhythm as being most natural. However, it is important that a difference between non-native rhythm and stress-timed rhythm in English speech should be quantified for further analysis. A pairwise variability index (PVI) of vocalic intervals was proposed as a speech rhythm measure. Native Japanese speakers tend to speak unnecessary vowels in English because a mora basically ends in a vowel. However, these unnecessary vowels affects PVI values: it is not appropriate to the quantification for non-native speech. In this study, we developed a statistical model of phonemic duration in English to be independent of a type of interval. Speech stimuli of English sentences (TIMIT) spoken by both English and Japanese native speakers were used. Phonemic duration for each phoneme were determined by experts. The expectation-maximization algorithm created a two-state transition model of the phonemic duration for each native language. Mean durations in each state were short and long, respectively. Results showed that a variability among states of self-transition probability for the native Japanese speaker was significantly larger than for the native English speaker (p < 0.01). This indicated that longer phonemic duration was continuously repeated for native English speakers more than for native Japanese speakers. This suggests that these structures of phonemic duration affected activity in the speech perception network.