Harnessing Linguistic Dissimilarity for Language Generalization on Unseen Low-Resource Varieties

Jinju Kim, Haeji Jung, Youjeong Roh, Jong Hwan Ko, David R. Mortensen


Abstract
Low-resource language varieties used by specific groups remain neglected in the development of Multilingual Language Models. A great deal of cross-lingual research focuses on inter-lingual language transfer which strives to align allied varieties and minimize differences between them. However, for low-resource varieties, linguistic dissimilarity is also an important cue allowing generalization to unseen varieties. Unlike prior approaches, we propose a two-stage Language Generalization framework that focuses on capturing variety-specific cues while also exploiting rich overlap offered by high-resource source variety. First, we propose TOPPing, a source-selection method specifically designed for low-resource varieties. Second, we suggest a lightweight VAÇAÍ-Bowl architecture that learns variety-specific attributes with one branch while a parallel branch captures variety-invariant attributes using adversarial training. We evaluate our framework on structural prediction tasks, which are among the few tasks available, as proxy for performance on other downstream tasks. Using VAÇAÍ-Bowl with TOPPing yields an average 54.62% improvement in the dependency parsing task, which serves as a proxy for performance on other downstream tasks across 10 low-resource varieties.
Anthology ID:
2026.conll-main.17
Volume:
Proceedings of the 30th Conference on Computational Natural Language Learning
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Claire Bonial, Yevgeni Berzak
Venues:
CoNLL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
284–300
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.conll-main.17/
DOI:
Bibkey:
Cite (ACL):
Jinju Kim, Haeji Jung, Youjeong Roh, Jong Hwan Ko, and David R. Mortensen. 2026. Harnessing Linguistic Dissimilarity for Language Generalization on Unseen Low-Resource Varieties. In Proceedings of the 30th Conference on Computational Natural Language Learning, pages 284–300, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
Harnessing Linguistic Dissimilarity for Language Generalization on Unseen Low-Resource Varieties (Kim et al., CoNLL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.conll-main.17.pdf