Simple Additions, Substantial Gains: Expanding Scripts, Languages, and Lineage Coverage in URIEL+

Mason Shipton, York Hay Ng, Aditya Khan, Phuong H. Hoang, Xiang Lu, A. Seza Dogruoz, Annie En-Shiun Lee


Abstract
The URIEL+ linguistic knowledge base supports multilingual research by encoding languages through geographic, genetic, and typological vectors. However, data sparsity (e.g. missing feature types, incomplete language entries, and limited genealogical coverage) remains prevalent. This limits the usefulness of URIEL+ in cross-lingual transfer, particularly for supporting low-resource languages. To address this sparsity, we extend URIEL+ by introducing script vectors to represent writing system properties for 7,488 languages, integrating Glottolog to add 18,710 additional languages, and expanding lineage imputation for 26,449 languages by propagating typological and script features across genealogies. These improvements reduce feature sparsity by 14% for script vectors, increase language coverage by up to 19,015 languages (1,007%), and boost imputation quality metrics by up to 35%. Our benchmark on cross-lingual transfer tasks (oriented around low-resource languages) shows occasionally divergent performance compared to URIEL+, with performance gains up to 6% in certain setups.
Anthology ID:
2026.lrec-main.863
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
11045–11059
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.863/
DOI:
Bibkey:
Cite (ACL):
Mason Shipton, York Hay Ng, Aditya Khan, Phuong H. Hoang, Xiang Lu, A. Seza Dogruoz, and Annie En-Shiun Lee. 2026. Simple Additions, Substantial Gains: Expanding Scripts, Languages, and Lineage Coverage in URIEL+. International Conference on Language Resources and Evaluation, main:11045–11059.
Cite (Informal):
Simple Additions, Substantial Gains: Expanding Scripts, Languages, and Lineage Coverage in URIEL+ (Shipton et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.863.pdf