Miguel Carpi


2024

pdf
Analysing and Validating Language Complexity Metrics Across South American Indigenous Languages
Felipe Serras | Miguel Carpi | Matheus Branco | Marcelo Finger
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

Language complexity is an emerging concept critical for NLP and for quantitative and cognitive approaches to linguistics. In this work, we evaluate the behavior of a set of compression-based language complexity metrics when applied to a large set of native South American languages. Our goal is to validate the desirable properties of such metrics against a more diverse set of languages, guaranteeing the universality of the techniques developed on the basis of this type of theoretical artifact. Our analysis confirmed with statistical confidence most propositions about the metrics studied, affirming their robustness, despite showing less stability than when the same metrics were applied to Indo-European languages. We also observed that the trade-off between morphological and syntactic complexities is strongly related to language phylogeny.