Giving robots a voice: Human-in-the-loop voice creation and open-ended labeling

van Rijn, Pol; Mertes, Silvan; Janowski, Kathrin; Weitz, Katharina; Jacoby, Nori; André, Elisabeth

doi:10.1145/3613904.3642038

Datensatz

DATENSATZ AKTIONENEXPORT

Zur Ablage hinzufügen

Lokale TagsFreigabegeschichteDetailsÜbersicht

Freigegeben

Konferenzbeitrag

Giving robots a voice: Human-in-the-loop voice creation and open-ended labeling

MPG-Autoren

/persons/resource/persons255681

van Rijn, Pol
Department of Neuroscience, Max Planck Institute for Empirical Aesthetics, Max Planck Society;

/persons/resource/persons242173

Jacoby, Nori
Research Group Computational Auditory Perception, Max Planck Institute for Empirical Aesthetics, Max Planck Society;

Externe Ressourcen

Es sind keine externen Ressourcen hinterlegt

Volltexte (beschränkter Zugriff)

Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.

Volltexte (frei zugänglich)

24-cap-jac-03-giving.pdf
(Verlagsversion), 15MB

Ergänzendes Material (frei zugänglich)

Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar

Zitation

van Rijn, P., Mertes, S., Janowski, K., Weitz, K., Jacoby, N., & André, E. (2024). Giving robots a voice: Human-in-the-loop voice creation and open-ended labeling. In F. F. Mueller, P. Kyburz, J. R. Williamson, C. Sas, M. L. Wilson, P. T. Dugas, et al. (Eds.), CHI '24: Proceedings of the CHI Conference on Human Factors in Computing Systems (pp. 1-34). doi:10.1145/3613904.3642038.

Zitierlink: https://hdl.handle.net/21.11116/0000-000F-600C-8

Zusammenfassung

Speech is a natural interface for humans to interact with robots. Yet, aligning a robot’s voice to its appearance is challenging due to the rich vocabulary of both modalities. Previous research has explored a few labels to describe robots and tested them on a limited number of robots and existing voices. Here, we develop a robot-voice creation tool followed by large-scale behavioral human experiments (N=2,505). First, participants collectively tune robotic voices to match 175 robot images using an adaptive human-in-the-loop pipeline. Then, participants describe their impression of the robot or their matched voice using another human-in-the-loop paradigm for open-ended labeling. The elicited taxonomy is then used to rate robot attributes and to predict the best voice for an unseen robot. We offer a web interface to aid engineers in customizing robot voices, demonstrating the synergy between cognitive science and machine learning for engineering tools.