hide
Free keywords:
-
Abstract:
Typical speech rates in conversation or broadcast media are around 150 to 200 words per minute. Yet, human listeners show an impressive degree of perceptual flexibility such that, with practice, they can understand speech presented at up to three times that rate (Dupoux & Green, 1997, JEP:HPP). However, exposure to time-compressed speech also leads to a perceptual after-effect: normal speech sounds unnaturally slow immediately after listening to time-compressed speech. Both these effects can be readily experienced using software built into most podcast players. However, the underlying functional and neural mechanisms that are responsible remain unspecified. In this work, we use behavioural and MEG experiments to explore the perceptual and neural processes that support speech rate adaptation and after-effects. We test whether and how these effects might arise from changes in delta and theta oscillations in the Superior Temporal Gyrus which track connected speech. In two behavioural studies, we first quantify the magnitude of the perceptual after-effect observed for 14 native English speakers listening to feature podcasts from The Guardian (@guardianaudio). In two experiments, we confirmed that: (1) participants report that speech at a natural speech rate sounds slower than normal after exposure to fast speech (50% time compression) and conversely that exposure to slowed speech (150% time expansion) leads listeners to report that natural speech sounds faster than normal. (2) Both these after-effects depend on the duration of the adaptation period; larger and long-lasting perceptual after-effects are observed after exposure to 60-seconds of time-compressed or expanded speech than after 20-seconds exposure. We also explored neural correlates of these perceptual adaptation and after-effects using MEG recordings from 16 native-English listeners. During an initial, 60-second period of natural speech we observed cluster-corrected significant cerebro-acoustic coherence (cf. Peelle, Gross & Davis, 2013, Cerebral Cortex) between auditory MEG responses and the amplitude envelope of speech in delta (0.1-3.2Hz) and theta (4.7-8.2Hz) ranges. During 40-second periods of adaptation to 60% time-compressed and 167% time-expanded speech we see significant increases (for 60% speech) and decreases (167% speech) in the peak frequency of delta but not theta entrainment. These effects build-up over time shown by a significant time (0-20sec, vs 20-40sec windows) by time-compression/expansion (60% vs 167%) interaction on the magnitude (F(2,30) = 45.81, p<.001) and peak frequency (F(2,30)=6.96, p<.01) of delta coherence. However, changes in the peak frequency of cerebro-acoustic coherence are smaller than the degree of compression/expansion applied to speech. This suggests a limit on the flexibility with which neural oscillations can entrain to speech at different rates despite speech remaining fully intelligible throughout. Although perceptual after-effects were pronounced in this group of listeners (confirmed by post-MEG behavioural data), these after-effects were not associated with any reliable change in the magnitude or frequency of cerebro-acoustic coherence. We are currently analysing multivariate temporal receptive fields (cf. Crosse et al, 2016, Frontiers Hum Neurosci) to determine whether differences in the timing of oscillatory entrainment are linked to perceptual adaptation or after-effects. These findings have implications for oscillatory accounts of speech perception and comprehension which will be discussed.