MultiFacet: A multi-tasking framework for speech-to-sign language generation

Kanakanti, Mounika; Singh, Shantanu; Shrivastava, Manish

doi:10.1145/3610661.3616550

Local TagsRelease HistoryDetailsSummary

MultiFacet: A multi-tasking framework for speech-to-sign language generation

Kanakanti, M., Singh, S., & Shrivastava, M. (2023). MultiFacet: A multi-tasking framework for speech-to-sign language generation. In E. André, M. Chetouani, D. Vaufreydaz, G. Lucas, T. Schultz, L.-P. Morency, et al. (Eds.), ICMI '23 Companion: Companion Publication of the 25th International Conference on Multimodal Interaction (pp. 205-213). New York: ACM. doi:10.1145/3610661.3616550.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/21.11116/0000-000D-E91E-C Version Permalink: https://hdl.handle.net/21.11116/0000-000D-E91F-B

Genre: Conference Paper

Files

show Files

hide Files

:

Kanakanti_etal_2023_multifacet.pdf (Publisher version), 5MB

View Save

File Permalink:
https://hdl.handle.net/21.11116/0000-000D-E920-8

Name:
Kanakanti_etal_2023_multifacet.pdf

Description:
-

OA-Status:
Gold

Visibility:
Public

MIME-Type / Checksum:
application/pdf / [MD5]

Technical Metadata:

View

Copyright Date:
2023

Copyright Info:
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM

License:
-

Locators

show

Creators

show

hide

Creators:
Kanakanti, Mounika^{1, 2}, Author
Singh, Shantanu², Author
Shrivastava, Manish², Author

Affiliations:
1Multimodal Language Department, MPI for Psycholinguistics, Max Planck Society, ou_3398547
2International Institute of Information Technology , ou_persistent22

Content

show

hide

Free keywords: -

Abstract: Sign language is a rich form of communication, uniquely conveying meaning through a combination of gestures, facial expressions, and body movements. Existing research in sign language generation has predominantly focused on text-to-sign pose generation, while speech-to-sign pose generation remains relatively underexplored. Speech-to-sign language generation models can facilitate effective communication between the deaf and hearing communities. In this paper, we propose an architecture that utilises prosodic information from speech audio and semantic context from text to generate sign pose sequences. In our approach, we adopt a multi-tasking strategy that involves an additional task of predicting Facial Action Units (FAUs). FAUs capture the intricate facial muscle movements that play a crucial role in conveying specific facial expressions during sign language generation. We train our models on an existing Indian Sign language dataset that contains sign language videos with audio and text translations. To evaluate our models, we report Dynamic Time Warping (DTW) and Probability of Correct Keypoints (PCK) scores. We find that combining prosody and text as input, along with incorporating facial action unit prediction as an additional task, outperforms previous models in both DTW and PCK scores. We also discuss the challenges and limitations of speech-to-sign pose generation models to encourage future research in this domain. We release our models, results and code to foster reproducibility and encourage future research1.

Details

show

hide

Language(s): eng - English

Dates: Published Online: 2023-10

Publication Status: Published online

Pages: -

Publishing info: -

Table of Contents: -

Rev. Type: Peer

Identifiers: DOI: 10.1145/3610661.3616550

Degree: -

Event

show

hide

Title: the 25th International Conference on Multimodal Interaction

Place of Event: Paris, France

Start-/End Date: 2023-10-09 - 2023-10-13

Legal Case

show

Project information

show

Source 1

show

hide

Title: ICMI '23 Companion: Companion Publication of the 25th International Conference on Multimodal Interaction

Source Genre: Proceedings

Creator(s):
André, Elisabeth, Editor
Chetouani, Mohamed, Editor
Vaufreydaz, Dominique, Editor
Lucas, Gale, Editor
Schultz, Tanja, Editor
Morency, Louis-Philippe, Editor
Vinciarelli, Alessandro, Editor

Affiliations:
-

Publ. Info: New York : ACM

Pages: - Volume / Issue: - Sequence Number: - Start / End Page: 205 - 213 Identifier: -