IGT2P: From Interlinear Glossed Texts to Paradigms
Sarah Moeller, Ling Liu, Changbing Yang, Katharina Kann, Mans Hulden
Abstract
An intermediate step in the linguistic analysis of an under-documented language is to find and organize inflected forms that are attested in natural speech. From this data, linguists generate unseen inflected word forms in order to test hypotheses about the language’s inflectional patterns and to complete inflectional paradigm tables. To get the data linguists spend many hours manually creating interlinear glossed texts (IGTs). We introduce a new task that speeds this process and automatically generates new morphological resources for natural language processing systems: IGT-to-paradigms (IGT2P). IGT2P generates entire morphological paradigms from IGT input. We show that existing morphological reinflection models can solve the task with 21% to 64% accuracy, depending on the language. We further find that (i) having a language expert spend only a few hours cleaning the noisy IGT data improves performance by as much as 21 percentage points, and (ii) POS tags, which are generally considered a necessary part of NLP morphological reinflection input, have no effect on the accuracy of the models considered here.- Anthology ID:
- 2020.emnlp-main.424
- Volume:
- Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 5251–5262
- Language:
- URL:
- https://aclanthology.org/2020.emnlp-main.424
- DOI:
- 10.18653/v1/2020.emnlp-main.424
- Cite (ACL):
- Sarah Moeller, Ling Liu, Changbing Yang, Katharina Kann, and Mans Hulden. 2020. IGT2P: From Interlinear Glossed Texts to Paradigms. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5251–5262, Online. Association for Computational Linguistics.
- Cite (Informal):
- IGT2P: From Interlinear Glossed Texts to Paradigms (Moeller et al., EMNLP 2020)
- PDF:
- https://preview.aclanthology.org/remove-xml-comments/2020.emnlp-main.424.pdf