Harnessing Linguistic Dissimilarity for Language Generalization on Unseen Low-Resource Varieties
Jinju Kim, Haeji Jung, Youjeong Roh, Jong Hwan Ko, David R. Mortensen
Abstract
Low-resource language varieties used by specific groups remain neglected in the development of Multilingual Language Models. A great deal of cross-lingual research focuses on inter-lingual language transfer which strives to align allied varieties and minimize differences between them. However, for low-resource varieties, linguistic dissimilarity is also an important cue allowing generalization to unseen varieties. Unlike prior approaches, we propose a two-stage Language Generalization framework that focuses on capturing variety-specific cues while also exploiting rich overlap offered by high-resource source variety. First, we propose TOPPing, a source-selection method specifically designed for low-resource varieties. Second, we suggest a lightweight VAÇAÍ-Bowl architecture that learns variety-specific attributes with one branch while a parallel branch captures variety-invariant attributes using adversarial training. We evaluate our framework on structural prediction tasks, which are among the few tasks available, as proxy for performance on other downstream tasks. Using VAÇAÍ-Bowl with TOPPing yields an average 54.62% improvement in the dependency parsing task, which serves as a proxy for performance on other downstream tasks across 10 low-resource varieties.- Anthology ID:
- 2026.conll-main.17
- Volume:
- Proceedings of the 30th Conference on Computational Natural Language Learning
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, USA
- Editors:
- Claire Bonial, Yevgeni Berzak
- Venues:
- CoNLL | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 284–300
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.conll-main.17/
- DOI:
- Cite (ACL):
- Jinju Kim, Haeji Jung, Youjeong Roh, Jong Hwan Ko, and David R. Mortensen. 2026. Harnessing Linguistic Dissimilarity for Language Generalization on Unseen Low-Resource Varieties. In Proceedings of the 30th Conference on Computational Natural Language Learning, pages 284–300, San Diego, California, USA. Association for Computational Linguistics.
- Cite (Informal):
- Harnessing Linguistic Dissimilarity for Language Generalization on Unseen Low-Resource Varieties (Kim et al., CoNLL 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.conll-main.17.pdf