English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Conference Paper

On the impact of language familiarity in talker change detection

MPS-Authors
/persons/resource/persons255421

Fink,  Lauren
Department of Music, Max Planck Institute for Empirical Aesthetics, Max Planck Society;
Center for Mind and Brain, Univ. of California;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available
Citation

Sharma, N., Krishnamohan, V., Ganapathy, S., Gangopadhayay, A., & Fink, L. (2020). On the impact of language familiarity in talker change detection. In The Institute of Electrical and Electronics EngineersSignal Processing Society (Ed.), 2020 IEEE InternationalConference on Acoustics, Speech,and Signal Processing: Proceedings (pp. 6249-6253). doi:10.1109/ICASSP40776.2020.9054294.


Cite as: https://hdl.handle.net/21.11116/0000-0008-3FDC-B
Abstract
The ability to detect talker changes when listening to conversational speech is fundamental to perception and understanding of multi-talker speech. In this paper, we propose an experimental paradigm to provide insights on the impact of language familiarity on talker change detection. Two multi-talker speech stimulus sets, one in a language familiar to the listeners (English) and the other unfamiliar (Chinese), are created. A listening test is performed in which listeners indicate the number of talkers in the presented stimuli. Analysis of human performance shows statistically significant results for: (a) lower miss (and a higher false alarm) rate in familiar versus unfamiliar language, and (b) longer response time in familiar versus unfamiliar language. These results signify a link between perception of talker attributes and language proficiency. Subsequently, a machine system is designed to perform the same task. The system makes use of the current state-of-the-art diarization approach with x-vector embeddings. A performance comparison on the same stimulus set indicates that the machine system falls short of human performance by a huge margin, for both languages.