English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  MultiFacet: A multi-tasking framework for speech-to-sign language generation

Kanakanti, M., Singh, S., & Shrivastava, M. (2023). MultiFacet: A multi-tasking framework for speech-to-sign language generation. In E. André, M. Chetouani, D. Vaufreydaz, G. Lucas, T. Schultz, L.-P. Morency, et al. (Eds.), ICMI '23 Companion: Companion Publication of the 25th International Conference on Multimodal Interaction (pp. 205-213). New York: ACM. doi:10.1145/3610661.3616550.

Item is

Basic

show hide
Genre: Conference Paper

Files

show Files
hide Files
:
Kanakanti_etal_2023_multifacet.pdf (Publisher version), 5MB
Name:
Kanakanti_etal_2023_multifacet.pdf
Description:
-
OA-Status:
Gold
Visibility:
Public
MIME-Type / Checksum:
application/pdf / [MD5]
Technical Metadata:
Copyright Date:
2023
Copyright Info:
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM
License:
-

Locators

show

Creators

show
hide
 Creators:
Kanakanti, Mounika1, 2, Author           
Singh, Shantanu2, Author
Shrivastava, Manish2, Author
Affiliations:
1Multimodal Language Department, MPI for Psycholinguistics, Max Planck Society, ou_3398547              
2International Institute of Information Technology , ou_persistent22              

Content

show
hide
Free keywords: -
 Abstract: Sign language is a rich form of communication, uniquely conveying meaning through a combination of gestures, facial expressions, and body movements. Existing research in sign language generation has predominantly focused on text-to-sign pose generation, while speech-to-sign pose generation remains relatively underexplored. Speech-to-sign language generation models can facilitate effective communication between the deaf and hearing communities. In this paper, we propose an architecture that utilises prosodic information from speech audio and semantic context from text to generate sign pose sequences. In our approach, we adopt a multi-tasking strategy that involves an additional task of predicting Facial Action Units (FAUs). FAUs capture the intricate facial muscle movements that play a crucial role in conveying specific facial expressions during sign language generation. We train our models on an existing Indian Sign language dataset that contains sign language videos with audio and text translations. To evaluate our models, we report Dynamic Time Warping (DTW) and Probability of Correct Keypoints (PCK) scores. We find that combining prosody and text as input, along with incorporating facial action unit prediction as an additional task, outperforms previous models in both DTW and PCK scores. We also discuss the challenges and limitations of speech-to-sign pose generation models to encourage future research in this domain. We release our models, results and code to foster reproducibility and encourage future research1.

Details

show
hide
Language(s): eng - English
 Dates: 2023-10
 Publication Status: Published online
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: Peer
 Identifiers: DOI: 10.1145/3610661.3616550
 Degree: -

Event

show
hide
Title: the 25th International Conference on Multimodal Interaction
Place of Event: Paris, France
Start-/End Date: 2023-10-09 - 2023-10-13

Legal Case

show

Project information

show

Source 1

show
hide
Title: ICMI '23 Companion: Companion Publication of the 25th International Conference on Multimodal Interaction
Source Genre: Proceedings
 Creator(s):
André, Elisabeth, Editor
Chetouani, Mohamed, Editor
Vaufreydaz, Dominique, Editor
Lucas, Gale, Editor
Schultz, Tanja, Editor
Morency, Louis-Philippe, Editor
Vinciarelli, Alessandro, Editor
Affiliations:
-
Publ. Info: New York : ACM
Pages: - Volume / Issue: - Sequence Number: - Start / End Page: 205 - 213 Identifier: -