FLORES+ Mayas: Generating Textual Resources to Foster the Development of Language Technologies for Mayan Languages

Andrés Lou, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez, Miquel Esplà-Gomis, Víctor M. Sánchez-Cartagena


Abstract
A significant percentage of the population of Guatemala and Mexico belongs to various Mayan indigenous communities, for whom language barriers lead to social, economic, and digital exclusion. The Mayan languages spoken by these communities remain severely underrepresented in terms of digital resources, which prevents them from leveraging the latest advances in artificial intelligence. This project addresses that problem by means of: 1) the digitisation and release of multiple printed linguistic resources; 2) the development of a high-quality parallel machine translation (MT) evaluation corpus for six Mayan languages. In doing so, we are paving the way for the development of MT systems that will facilitate the access for Mayan speakers to essential services such as healthcare or legal aid. The resources are produced with the essential participation of indigenous communities, whereby native speakers provide the necessary translation services, QA, and linguistic expertise. The project is funded by the Google Academic Research Awards and carried out in collaboration with the Proyecto Lingüístico Francisco Marroquín Foundation in Guatemala.
Anthology ID:
2025.mtsummit-2.15
Volume:
Proceedings of Machine Translation Summit XX: Volume 2
Month:
June
Year:
2025
Address:
Geneva, Switzerland
Editors:
Pierrette Bouillon, Johanna Gerlach, Sabrina Girletti, Lise Volkart, Raphael Rubino, Rico Sennrich, Samuel Läubli, Martin Volk, Miquel Esplà-Gomis, Vincent Vandeghinste, Helena Moniz, Sara Szoc
Venue:
MTSummit
SIG:
Publisher:
European Association for Machine Translation
Note:
Pages:
89–90
Language:
URL:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.mtsummit-2.15/
DOI:
Bibkey:
Cite (ACL):
Andrés Lou, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez, Miquel Esplà-Gomis, and Víctor M. Sánchez-Cartagena. 2025. FLORES+ Mayas: Generating Textual Resources to Foster the Development of Language Technologies for Mayan Languages. In Proceedings of Machine Translation Summit XX: Volume 2, pages 89–90, Geneva, Switzerland. European Association for Machine Translation.
Cite (Informal):
FLORES+ Mayas: Generating Textual Resources to Foster the Development of Language Technologies for Mayan Languages (Lou et al., MTSummit 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.mtsummit-2.15.pdf