Abstract
Many modalities are naturally represented as continuous signals, making it difficult to use them with models that expect discrete units, such as LLMs. In this paper, we explore the use of audio compression techniques for the discrete representation of the gestures used in sign language. We train a tokenizer for American Sign Language (ASL) fingerspelling, which discretizes sequences of fingerspelling signs into tokens. We also propose a loss function to improve the interpretability of these tokens such that they preserve both the semantic and the visual information of the signal. We show that the proposed method improves the performance of the discretized sequence on downstream tasks.- Anthology ID:
- 2024.emnlp-main.1104
- Volume:
- Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 19786–19793
- Language:
- URL:
- https://aclanthology.org/2024.emnlp-main.1104
- DOI:
- 10.18653/v1/2024.emnlp-main.1104
- Cite (ACL):
- Artem Abzaliev and Rada Mihalcea. 2024. Unsupervised Discrete Representations of American Sign Language. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 19786–19793, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- Unsupervised Discrete Representations of American Sign Language (Abzaliev & Mihalcea, EMNLP 2024)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2024.emnlp-main.1104.pdf