Libras-UFPel Corpus: A Parallel Dataset of Brazilian Sign Language and Portuguese for Multimodal Research and Processing
Antonielle Martins, Brenda S. Santana, Francielle Martins, Tatiana Lebedeff, Darley Nunes, Luisa Bohm
Abstract
The Libras-UFPel Corpus is a multimodal, multilayer parallel resource designed for the documentation and computational analysis of Brazilian Sign Language (Libras) in systematic alignment with written Portuguese. By integrating controlled recordings with naturalistic data from the Inventário Nacional de Libras-Pelotas, the corpus ensures interoperability through shared methodological standards. The dataset currently comprises 4,800 controlledaudiovisual records (2,400 sentences and 2,400 isolated signs) fully paired with Portuguese translations, supplemented by approximately 10 hours of spontaneous interaction from threenew naturalistic interviews, currently in the editing phase. To date, 1,200 controlled sentences have been lemmatized, gloss-annotatedand translated, providing a structured parallel subset for Libras-to-Portuguese Sign Language Processing tasks such as recognition and machine translation. The annotation model follows a hierarchical structure covering lexical, partially lexical, and non-lexical signs, including independent tiers for non-manual markers. By bridging descriptive linguistics and Natural Language Processing, Libras-UFPel Corpus serves as a reference source for bilingual data-driven modeling, advancing digital inclusion and linguistic accessibility.- Anthology ID:
- 2026.propor-1.112
- Volume:
- Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
- Month:
- April
- Year:
- 2026
- Address:
- Salvador, Brazil
- Editors:
- Marlo Souza, Iria de-Dios-Flores, Diana Santos, Larissa Freitas, Jackson Wilke da Cruz Souza, Eugénio Ribeiro
- Venue:
- PROPOR
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1068–1073
- Language:
- URL:
- https://preview.aclanthology.org/ingest-dnd/2026.propor-1.112/
- DOI:
- Cite (ACL):
- Antonielle Martins, Brenda S. Santana, Francielle Martins, Tatiana Lebedeff, Darley Nunes, and Luisa Bohm. 2026. Libras-UFPel Corpus: A Parallel Dataset of Brazilian Sign Language and Portuguese for Multimodal Research and Processing. In Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1, pages 1068–1073, Salvador, Brazil. Association for Computational Linguistics.
- Cite (Informal):
- Libras-UFPel Corpus: A Parallel Dataset of Brazilian Sign Language and Portuguese for Multimodal Research and Processing (Martins et al., PROPOR 2026)
- PDF:
- https://preview.aclanthology.org/ingest-dnd/2026.propor-1.112.pdf