English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
 
 
DownloadE-Mail
  Exploring emotional prototypes in a high dimensional TTS latent space

van Rijn, P., Mertes, S., Schiller, D., Harrison, P. M. C., Larrouy-Maestri, P., André, E., et al. (2021). Exploring emotional prototypes in a high dimensional TTS latent space. In Proceedings Interspeech 2021 (pp. 3870-3874). Baixas: ISCA. doi:10.21437/Interspeech.2021-1538.

Item is

Basic

show hide
Genre: Conference Paper

Files

show Files

Locators

show

Creators

show
hide
 Creators:
van Rijn, Pol1, Author           
Mertes, Silvan2, Author
Schiller, Dominik2, Author
Harrison, Peter M. C.3, Author           
Larrouy-Maestri, Pauline1, 4, Author           
André, Elisabeth2, Author
Jacoby, Nori3, Author           
Affiliations:
1Department of Neuroscience, Max Planck Institute for Empirical Aesthetics, Max Planck Society, ou_2421697              
2Human-Centered Artificial Intelligence, Augsburg, Germany, ou_persistent22              
3Research Group Computational Auditory Perception, Max Planck Institute for Empirical Aesthetics, Max Planck Society, ou_3024247              
4Max-Planck-NYU, Center for Language, Music, and Emotion, New York, USA, ou_persistent22              

Content

show
hide
Free keywords: -
 Abstract: Recent TTS systems are able to generate prosodically varied and realistic speech. However, it is unclear how this prosodic variation contributes to the perception of speakers’ emotional states. Here we use the recent psychological paradigm ‘Gibbs Sampling with People’ to search the prosodic latent space in a trained Global Style Token Tacotron model to explore prototypes of emotional prosody. Participants are recruited online and collectively manipulate the latent space of the generative speech model in a sequentially adaptive way so that the stimulus presented to one group of participants is determined by the response of the previous groups. We demonstrate that (1) particular regions of the model’s latent space are reliably associated with particular emotions, (2) the resulting emotional prototypes are well-recognized by a separate group of human raters, and (3) these emotional prototypes can be effectively transferred to new sentences. Collectively, these experiments demonstrate a novel approach to the understanding of emotional speech by providing a tool to explore the relation between the latent space of generative models and human semantics.

Details

show
hide
Language(s): eng - English
 Dates: 2021
 Publication Status: Published online
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: DOI: 10.21437/Interspeech.2021-1538
 Degree: -

Event

show
hide
Title: Interspeech 2021
Place of Event: Brno, Czechia
Start-/End Date: 2021-08-30 - 2021-09-03

Legal Case

show

Project information

show

Source 1

show
hide
Title: Proceedings Interspeech 2021
Source Genre: Proceedings
 Creator(s):
Affiliations:
Publ. Info: Baixas : ISCA
Pages: - Volume / Issue: - Sequence Number: - Start / End Page: 3870 - 3874 Identifier: -