Thomas Brochhagen
2026
AvarLab: An Integrated Digital Ecosystem for Avar, a Morphologically Rich Low-Resource Language
Kebed Zagidov | Thomas Brochhagen
Proceedings of the Ninth Workshop on the Use of Computational Methods in the Study of Endangered Languages (ComputEL-9)
Kebed Zagidov | Thomas Brochhagen
Proceedings of the Ninth Workshop on the Use of Computational Methods in the Study of Endangered Languages (ComputEL-9)
This paper presents a digital ecosystem designed for Avar, a morphologically rich and vulnerable Northeast Caucasian language. Addressing the common bottleneck where lexical resources, corpora, and computational tools are developed in isolation or are entirely absent, we propose the "generate-verify" workflow. By developing a scalable, rule-based computational architecture, our system specifically targets the challenges of low-resource settings, overcoming data sparsity to generate over one million inflected forms from a static dictionary of 14,700 entries.Furthermore, by coupling morphological generation with corpus verification, we introduce a dynamic method to rapidly analyze and expand endangered language data. This approach transforms static linguistic documentation into active language reclamation tools, supporting dictionary lookup and the creation of silver-standard annotations for downstream NLP. The platform also serves as a unified model for the collection, management, and mobilization of fragmented language data, ensuring that the resulting resources are directly accessible and beneficial to the speaker community. Ultimately, AvarLab provides a practical, adaptable pathway for building sustainable digital infrastructure by fostering interaction among documentary linguists, computer scientists, and native speakers.
2022
How Universal is Metonymy? Results from a Large-Scale Multilingual Analysis
Temuulen Khishigsuren | Gábor Bella | Thomas Brochhagen | Daariimaa Marav | Fausto Giunchiglia | Khuyagbaatar Batsuren
Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP
Temuulen Khishigsuren | Gábor Bella | Thomas Brochhagen | Daariimaa Marav | Fausto Giunchiglia | Khuyagbaatar Batsuren
Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP
Metonymy is regarded by most linguists as a universal cognitive phenomenon, especially since the emergence of the theory of conceptual mappings. However, the field data backing up claims of universality has not been large enough so far to provide conclusive evidence. We introduce a large-scale analysis of metonymy based on a lexical corpus of over 20 thousand metonymy instances from 189 languages and 69 genera. No prior study, to our knowledge, is based on linguistic coverage as broad as ours. Drawing on corpus analysis, evidence of universality is found at three levels: systematic metonymy in general, particular metonymy patterns, and specific metonymy concepts.
Horse or pony? Visual typicality and lexical frequency affect variability in object naming
Eleonora Gualdoni | Andreas Madebach | Thomas Brochhagen | Gemma Boleda
Proceedings of the Society for Computation in Linguistics 2022
Eleonora Gualdoni | Andreas Madebach | Thomas Brochhagen | Gemma Boleda
Proceedings of the Society for Computation in Linguistics 2022
The interaction between cognitive ease and informativeness shapes the lexicons of natural languages
Thomas Brochhagen | Gemma Boleda
Proceedings of the Society for Computation in Linguistics 2022
Thomas Brochhagen | Gemma Boleda
Proceedings of the Society for Computation in Linguistics 2022
The interaction between cognitive ease and informativeness shapes the lexicons of natural languages
Thomas Brochhagen | Gemma Boleda
Proceedings of the First Workshop on NLP applications to field linguistics
Thomas Brochhagen | Gemma Boleda
Proceedings of the First Workshop on NLP applications to field linguistics
It is common for languages to express multiple meanings with the same word, a phenomenon known as colexification. For instance, the meanings FINGER and TOE colexify in the word “dedo” in Spanish, while they do not colexify in English. Colexification has been suggested to follow universal constraints. In particular, previous work has shown that related meanings are more prone to colexify. This tendency has been explained in terms of the cognitive pressure for ease, since expressing related meanings with the same word makes lexicons easier to learn and use. The present study examines the interplay between this pressure and a competing universal constraint, the functional pressure for languages to maximize informativeness. We hypothesize that meanings are more likely to colexify if they are related (fostering ease), but not so related as to become confusable and cause misunderstandings (fostering informativeness). We find support for this principle in data from over 1200 languages and 1400 meanings. Our results thus suggest that universal principles shape the lexicons of natural languages. More broadly, they contribute to the growing body of evidence suggesting that languages evolve to strike a balance between competing functional and cognitive pressures.