Claudio Gutierrez
2026
What Resources Matter for Interlinear Glossing? Using LLMs and RAG for the Low-Resource Mapudungun Language
Anaís Almendra | Arianna Bisazza | Claudio Gutierrez | Felipe Hasler
Proceedings of the Sixth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP)
Anaís Almendra | Arianna Bisazza | Claudio Gutierrez | Felipe Hasler
Proceedings of the Sixth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP)
Interlinear glossing is essential for the study and revitalization of endangered languages. However, it remains a time-consuming process that requires extensive linguistic expertise. Recent advances in Large Language Models (LLMs) offer a potential solution. In this research, we study the case of Mapudungun, an endangered language spoken in Chile and Argentina, to generate automatic interlinear glosses using the Gemini 2.5 Pro model. Our study investigates which information configuration through Retrieval-Augmented Generation (RAG) yields the best results. We compare the integration of a formal grammar, a dictionary, a small annotated corpus, and a combination of all these resources. Our evaluation shows that while dictionary integration causes a significant degradation in performance, grounding the model with a structured corpus maximizes accuracy relative to the resources employed. Notably, we find that a remarkably small dataset of 589 meaning units provides enough normative guidance to significantly improve the morphological tagging task. This work highlights the viability of utilizing minimally annotated corpora to assist in the documentation of morphologically complex languages.
2022
Educational Tools for Mapuzugun
Cristian Ahumada | Claudio Gutierrez | Antonios Anastasopoulos
Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022)
Cristian Ahumada | Claudio Gutierrez | Antonios Anastasopoulos
Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022)
Mapuzugun is the language of the Mapuche people. Due to political and historical reasons, its number of speakers has decreased and the language has been excluded from the educational system in Chile and Argentina. For this reason, it is very important to support the revitalization of the Mapuzugun in all spaces and media of society. In this work we present a tool towards supporting educational activities of Mapuzugun, tailored to the characteristics of the language. The tool consists of three parts: design and development of an orthography detector and converter; a morphological analyzer; and an informal translator. We also present a case study with Mapuzugun students showing promising results. Short abstract in Mapuzugun: Tüfachi küzaw pegelfi kiñe zugun küzawpeyüm kelluaetew pu mapuzugun chillkatufe kimal kizu tañi zugun.
2016
Dictionaries as Networks: Identifying the graph structure of Ogden’s Basic English
Camilo Garrido | Claudio Gutierrez
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Camilo Garrido | Claudio Gutierrez
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
We study the network structure underlying dictionaries. We systematize the properties of such networks and show their relevance for linguistics. As case of study, we apply this technique to identify the graph structure of Ogden’s Basic English. We show that it constitutes a strong core of the English language network and that classic centrality measures fail to capture this set of words.