Claire Post

2026

Linguistic Feature Tagging for Automatic Classification of 27 Closely-Related Quechua Varieties
Claire Post | Alexis Palmer
Proceedings of the Sixth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP)

This paper presents a multi-dialect text classifier for Quechua that augments neural models with rule-based linguistic information to address challenges in low-resource, morphologically complex settings. The approach is built on a carefully curated dataset spanning multiple genres, including annotated parallel bible corpora, and encodes manually annotated lexical variation and polypersonal verbal agreement as explicit features within a transformer-based classifier. Results show that neural models substantially outperform statistical baselines, enabling highly accurate multi-class classification across 27 Quechua dialects. The impact of linguistic augmentation is context-dependent: gains are minimal in high-resource settings but more pronounced in low-resource and cross-domain conditions. Overall, this work aims to contribute to the development of dialect-sensitive NLP methods for Quechua and other low-resource, morphologically rich languages.

2024

pdf bib abs

This paper describes the LECS Lab submission to the AmericasNLP 2024 Shared Task on the Creation of Educational Materials for Indigenous Languages. The task requires transforming a base sentence with regards to one or more linguistic properties (such as negation or tense). We observe that this task shares many similarities with the well-studied task of word-level morphological inflection, and we explore whether the findings from inflection research are applicable to this task. In particular, we experiment with a number of augmentation strategies, finding that they can significantly benefit performance, but that not all augmented data is necessarily beneficial. Furthermore, we find that our character-level neural models show high variability with regards to performance on unseen data, and may not be the best choice when training data is limited.

Co-authors

Venues

AmericasNLP2
WS2

Fix author