On the impact of language familiarity in talker change detection

Sharma, Neeraj; Krishnamohan, Venkat; Ganapathy, Sriram; Gangopadhayay, Ahana; Fink, Lauren

doi:10.1109/ICASSP40776.2020.9054294

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Conference Paper

On the impact of language familiarity in talker change detection

MPS-Authors

/persons/resource/persons255421

Fink, Lauren
Department of Music, Max Planck Institute for Empirical Aesthetics, Max Planck Society;
Center for Mind and Brain, Univ. of California;

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Sharma, N., Krishnamohan, V., Ganapathy, S., Gangopadhayay, A., & Fink, L. (2020). On the impact of language familiarity in talker change detection. In The Institute of Electrical and Electronics EngineersSignal Processing Society (Ed.), 2020 IEEE InternationalConference on Acoustics, Speech,and Signal Processing: Proceedings (pp. 6249-6253). doi:10.1109/ICASSP40776.2020.9054294.

Cite as: https://hdl.handle.net/21.11116/0000-0008-3FDC-B

Abstract

The ability to detect talker changes when listening to conversational speech is fundamental to perception and understanding of multi-talker speech. In this paper, we propose an experimental paradigm to provide insights on the impact of language familiarity on talker change detection. Two multi-talker speech stimulus sets, one in a language familiar to the listeners (English) and the other unfamiliar (Chinese), are created. A listening test is performed in which listeners indicate the number of talkers in the presented stimuli. Analysis of human performance shows statistically significant results for: (a) lower miss (and a higher false alarm) rate in familiar versus unfamiliar language, and (b) longer response time in familiar versus unfamiliar language. These results signify a link between perception of talker attributes and language proficiency. Subsequently, a machine system is designed to perform the same task. The system makes use of the current state-of-the-art diarization approach with x-vector embeddings. A performance comparison on the same stimulus set indicates that the machine system falls short of human performance by a huge margin, for both languages.