Deutsch
 
Hilfe Datenschutzhinweis Impressum
  DetailsucheBrowse

Datensatz

DATENSATZ AKTIONENEXPORT
  MultiFacet: A multi-tasking framework for speech-to-sign language generation

Kanakanti, M., Singh, S., & Shrivastava, M. (2023). MultiFacet: A multi-tasking framework for speech-to-sign language generation. In E. André, M. Chetouani, D. Vaufreydaz, G. Lucas, T. Schultz, L.-P. Morency, et al. (Eds.), ICMI '23 Companion: Companion Publication of the 25th International Conference on Multimodal Interaction (pp. 205-213). New York: ACM. doi:10.1145/3610661.3616550.

Item is

Basisdaten

einblenden: ausblenden:
Genre: Konferenzbeitrag

Dateien

einblenden: Dateien
ausblenden: Dateien
:
Kanakanti_etal_2023_multifacet.pdf (Verlagsversion), 5MB
Name:
Kanakanti_etal_2023_multifacet.pdf
Beschreibung:
-
OA-Status:
Gold
Sichtbarkeit:
Öffentlich
MIME-Typ / Prüfsumme:
application/pdf / [MD5]
Technische Metadaten:
Copyright Datum:
2023
Copyright Info:
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM
Lizenz:
-

Externe Referenzen

einblenden:

Urheber

einblenden:
ausblenden:
 Urheber:
Kanakanti, Mounika1, 2, Autor           
Singh, Shantanu2, Autor
Shrivastava, Manish2, Autor
Affiliations:
1Multimodal Language Department, MPI for Psycholinguistics, Max Planck Society, ou_3398547              
2International Institute of Information Technology , ou_persistent22              

Inhalt

einblenden:
ausblenden:
Schlagwörter: -
 Zusammenfassung: Sign language is a rich form of communication, uniquely conveying meaning through a combination of gestures, facial expressions, and body movements. Existing research in sign language generation has predominantly focused on text-to-sign pose generation, while speech-to-sign pose generation remains relatively underexplored. Speech-to-sign language generation models can facilitate effective communication between the deaf and hearing communities. In this paper, we propose an architecture that utilises prosodic information from speech audio and semantic context from text to generate sign pose sequences. In our approach, we adopt a multi-tasking strategy that involves an additional task of predicting Facial Action Units (FAUs). FAUs capture the intricate facial muscle movements that play a crucial role in conveying specific facial expressions during sign language generation. We train our models on an existing Indian Sign language dataset that contains sign language videos with audio and text translations. To evaluate our models, we report Dynamic Time Warping (DTW) and Probability of Correct Keypoints (PCK) scores. We find that combining prosody and text as input, along with incorporating facial action unit prediction as an additional task, outperforms previous models in both DTW and PCK scores. We also discuss the challenges and limitations of speech-to-sign pose generation models to encourage future research in this domain. We release our models, results and code to foster reproducibility and encourage future research1.

Details

einblenden:
ausblenden:
Sprache(n): eng - English
 Datum: 2023-10
 Publikationsstatus: Online veröffentlicht
 Seiten: -
 Ort, Verlag, Ausgabe: -
 Inhaltsverzeichnis: -
 Art der Begutachtung: Expertenbegutachtung
 Identifikatoren: DOI: 10.1145/3610661.3616550
 Art des Abschluß: -

Veranstaltung

einblenden:
ausblenden:
Titel: the 25th International Conference on Multimodal Interaction
Veranstaltungsort: Paris, France
Start-/Enddatum: 2023-10-09 - 2023-10-13

Entscheidung

einblenden:

Projektinformation

einblenden:

Quelle 1

einblenden:
ausblenden:
Titel: ICMI '23 Companion: Companion Publication of the 25th International Conference on Multimodal Interaction
Genre der Quelle: Konferenzband
 Urheber:
André, Elisabeth, Herausgeber
Chetouani, Mohamed, Herausgeber
Vaufreydaz, Dominique, Herausgeber
Lucas, Gale, Herausgeber
Schultz, Tanja, Herausgeber
Morency, Louis-Philippe, Herausgeber
Vinciarelli, Alessandro, Herausgeber
Affiliations:
-
Ort, Verlag, Ausgabe: New York : ACM
Seiten: - Band / Heft: - Artikelnummer: - Start- / Endseite: 205 - 213 Identifikator: -