Low-Resource Sign Language Glossing Profits From Data Augmentation

Diana Vania Lara Ortiz; Sebastian Padó

Low-Resource Sign Language Glossing Profits From Data Augmentation

Abstract

Glossing is the task of translating from a written language into a sequence of glosses, i.e., textual representations of signs from some sign language. While glossing is in principle ‘just’ a machine translation (MT) task, sign languages still lack the large parallel corpora that exist for many written language pairs and underlie the development of dedicated MT systems. In this work, we demonstrate that glossing can be significantly improved through data augmentation. We fine-tune a Spanish transformer model both on a small dedicated corpus 3,000 Spanish–Mexican Sign Language (MSL) gloss sentence pairs, and on a corpus augmented with an English–American Sign Language (ASL) gloss corpus. We obtain the best results when we oversample from the ASL corpus by a factor of ~4, achieving a BLEU increase from 62 to 85 and a TER reduction from 44 to 20. This demonstrates the usefulness of combining corpora in low-resource glossing situations.

Anthology ID:: 2025.wslp-main.3
Volume:: Proceedings of the Workshop on Sign Language Processing (WSLP)
Month:: December
Year:: 2025
Address:: IIT Bombay, Mumbai, India (Co-located with IJCNLP–AACL 2025)
Editors:: Mohammed Hasanuzzaman, Facundo Manuel Quiroga, Ashutosh Modi, Sabyasachi Kamila, Keren Artiaga, Abhinav Joshi, Sanjeet Singh
Venues:: WSLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 14–19
Language:
URL:: https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.wslp-main.3/
DOI:
Bibkey:
Cite (ACL):: Diana Vania Lara Ortiz and Sebastian Padó. 2025. Low-Resource Sign Language Glossing Profits From Data Augmentation. In Proceedings of the Workshop on Sign Language Processing (WSLP), pages 14–19, IIT Bombay, Mumbai, India (Co-located with IJCNLP–AACL 2025). Association for Computational Linguistics.
Cite (Informal):: Low-Resource Sign Language Glossing Profits From Data Augmentation (Ortiz & Padó, WSLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.wslp-main.3.pdf

PDF Cite Search Fix data