Abstract
This paper explores a novel method to modify existing pre-trained word embedding models of spoken languages for Sign Language glosses. These newly-generated embeddings are described, visualised, and then used in the encoder and/or decoder of models for the Text2Gloss and Gloss2Text task of machine translation. In two translation settings (one including data augmentation-based pre-training and a baseline), we find that bootstrapped word embeddings for glosses improve translation across four Signed/spoken language pairs. Many improvements are statistically significant, including those where the bootstrapped gloss embedding models are used.Languages included: American Sign Language, Finnish Sign Language, Spanish Sign Language, Sign Language of The Netherlands.- Anthology ID:
- 2024.eamt-1.13
- Volume:
- Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1)
- Month:
- June
- Year:
- 2024
- Address:
- Sheffield, UK
- Editors:
- Carolina Scarton, Charlotte Prescott, Chris Bayliss, Chris Oakley, Joanna Wright, Stuart Wrigley, Xingyi Song, Edward Gow-Smith, Rachel Bawden, Víctor M Sánchez-Cartagena, Patrick Cadwell, Ekaterina Lapshinova-Koltunski, Vera Cabarrão, Konstantinos Chatzitheodorou, Mary Nurminen, Diptesh Kanojia, Helena Moniz
- Venue:
- EAMT
- SIG:
- Publisher:
- European Association for Machine Translation (EAMT)
- Note:
- Pages:
- 116–132
- Language:
- URL:
- https://aclanthology.org/2024.eamt-1.13
- DOI:
- Cite (ACL):
- Euan McGill, Luis Chiruzzo, and Horacio Saggion. 2024. Bootstrapping Pre-trained Word Embedding Models for Sign Language Gloss Translation. In Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1), pages 116–132, Sheffield, UK. European Association for Machine Translation (EAMT).
- Cite (Informal):
- Bootstrapping Pre-trained Word Embedding Models for Sign Language Gloss Translation (McGill et al., EAMT 2024)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2024.eamt-1.13.pdf