From Isolates to Families: Using Neural Networks for Automated Language Affiliation

Frederic Blum, Steffen Herbold, Johann-Mattis List


Abstract
In historical linguistics, the affiliation of languages to a common language family is traditionally carried out using a complex workflow that relies on manually comparing individual languages. Large-scale standardized collections of multilingual wordlists and grammatical language structures might help to improve this and open new avenues for developing automated language affiliation workflows. Here, we present neural network models that use lexical and grammatical data from a worldwide sample of more than 1,200 languages with known affiliations to classify individual languages into families. In line with the traditional assumption of most linguists, our results show that models trained on lexical data alone outperform models solely based on grammatical data, whereas combining both types of data yields even better performance. In additional experiments, we show how our models can identify long-ranging relations between entire subgroups, how they can be employed to investigate potential relatives of linguistic isolates, and how they can help us to obtain first hints on the affiliation of so far unaffiliated languages. We conclude that models for automated language affiliation trained on lexical and grammatical data provide comparative linguists with a valuable tool for evaluating hypotheses about deep and unknown language relations.
Anthology ID:
2025.acl-long.876
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
17915–17927
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.876/
DOI:
Bibkey:
Cite (ACL):
Frederic Blum, Steffen Herbold, and Johann-Mattis List. 2025. From Isolates to Families: Using Neural Networks for Automated Language Affiliation. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 17915–17927, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
From Isolates to Families: Using Neural Networks for Automated Language Affiliation (Blum et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.876.pdf