Prompt and circumstance”:" A word-by-word LLM prompting approach to interlinear glossing for low-resource languages

Micha Elsner; David Liu

Prompt and circumstance”:" A word-by-word LLM prompting approach to interlinear glossing for low-resource languages

Abstract

This paper presents VeLePa, an inflected verbal lexicon of Central Pame (pbs, cent2154), an Otomanguean language from Mexico. This resource contains 12528 words in phonological form representing the complete inflectional paradigms of 216 verbs, supplemented with use frequencies. Computer-operable (CLDF) inflected lexicons of non-WEIRD underresourced languages are urgently needed to expand digital capacities in this languages (e.g. in NLP). VeLePa contributes to this, and does so with data from a language which is morphologically extraordinary, with unusually high levels of irregularity and multiple conjugations at various loci within the word”:" prefixes, stems, tone, and suffixes constitute different albeit interrelated subsystems of inflection. Partly automated creation of interlinear glossed text (IGT) has the potential to assist in linguistic documentation. We argue that LLMs can make this process more accessible to linguists because of their capacity to follow natural-language instructions. We investigate the effectiveness of a retrieval-based LLM prompting approach to glossing, applied to the seven languages from the SIGMORPHON 2023 shared task. Our system beats the BERTbased shared task baseline for every language in the morpheme-level score category, and we show that a simple 3-best oracle has higher word-level scores than the challenge winner (a tuned sequence model) in five languages. In a case study on Tsez, we ask the LLM to automatically create and follow linguistic instructions, reducing errors on a confusing grammatical feature. Our results thus demonstrate the potential contributions which LLMs can make in interactive systems for glossing, both in making suggestions to human annotators and following directions.

Anthology ID:: 2025.sigmorphon-main.1
Volume:: Proceedings of the The 22nd SIGMORPHON workshop on Computational Morphology, Phonology, and Phonetics
Month:: May
Year:: 2025
Address:: Albuquerque, New Mexico, USA
Editors:: Garrett Nicolai, Eleanor Chodroff, Frederic Mailhot, Çağrı Çöltekin
Venues:: SIGMORPHON | WS
SIG:: SIGMORPHON
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1–14
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2025.sigmorphon-main.1/
DOI:
Bibkey:
Cite (ACL):: Micha Elsner and David Liu. 2025. Prompt and circumstance”:" A word-by-word LLM prompting approach to interlinear glossing for low-resource languages. In Proceedings of the The 22nd SIGMORPHON workshop on Computational Morphology, Phonology, and Phonetics, pages 1–14, Albuquerque, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):: Prompt and circumstance”:" A word-by-word LLM prompting approach to interlinear glossing for low-resource languages (Elsner & Liu, SIGMORPHON 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2025.sigmorphon-main.1.pdf

PDF Cite Search Fix data