Giving robots a voice: Human-in-the-loop voice creation and open-ended labeling

van Rijn, Pol; Mertes, Silvan; Janowski, Kathrin; Weitz, Katharina; Jacoby, Nori; André, Elisabeth

doi:10.1145/3613904.3642038

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Conference Paper

Giving robots a voice: Human-in-the-loop voice creation and open-ended labeling

MPS-Authors

/persons/resource/persons255681

van Rijn, Pol
Department of Neuroscience, Max Planck Institute for Empirical Aesthetics, Max Planck Society;

/persons/resource/persons242173

Jacoby, Nori
Research Group Computational Auditory Perception, Max Planck Institute for Empirical Aesthetics, Max Planck Society;

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

24-cap-jac-03-giving.pdf
(Publisher version), 15MB

Supplementary Material (public)

There is no public supplementary material available

Citation

van Rijn, P., Mertes, S., Janowski, K., Weitz, K., Jacoby, N., & André, E. (2024). Giving robots a voice: Human-in-the-loop voice creation and open-ended labeling. In F. F. Mueller, P. Kyburz, J. R. Williamson, C. Sas, M. L. Wilson, P. T. Dugas, et al. (Eds.), CHI '24: Proceedings of the CHI Conference on Human Factors in Computing Systems (pp. 1-34). doi:10.1145/3613904.3642038.

Cite as: https://hdl.handle.net/21.11116/0000-000F-600C-8

Abstract

Speech is a natural interface for humans to interact with robots. Yet, aligning a robot’s voice to its appearance is challenging due to the rich vocabulary of both modalities. Previous research has explored a few labels to describe robots and tested them on a limited number of robots and existing voices. Here, we develop a robot-voice creation tool followed by large-scale behavioral human experiments (N=2,505). First, participants collectively tune robotic voices to match 175 robot images using an adaptive human-in-the-loop pipeline. Then, participants describe their impression of the robot or their matched voice using another human-in-the-loop paradigm for open-ended labeling. The elicited taxonomy is then used to rate robot attributes and to predict the best voice for an unseen robot. We offer a web interface to aid engineers in customizing robot voices, demonstrating the synergy between cognitive science and machine learning for engineering tools.