Cross-Lingual Knowledge Augmentation for Mitigating Generic Overgeneralization in Multilingual Language Models

Sello Ralethe, Jan Buys


Abstract
Generic statements like “birds fly” or “lions have manes” express generalizations about kinds that allow exceptions, yet language models tend to overgeneralize them to universal claims. While previous work showed that ASCENT KB could reduce this effect in English by 30-40%, the effectiveness of broader knowledge sources and the cross-lingual nature of this phenomenon remain unexplored. We investigate generic overgeneralization across English and four South African languages (isiZulu, isiXhosa, Sepedi, SeSotho), comparing the impact of ConceptNet and DBpedia against the previously used ASCENT KB. Our experiments show that ConceptNet reduces overgeneralization by 45-52%% for minority characteristic generics, while DBpedia achieves 48-58%% for majority characteristics, with combined knowledge bases reaching 67%% reduction. These improvements are consistent across all languages, though Nguni languages show higher baseline overgeneralization than Sotho-Tswana languages, potentially suggesting that morphological features may influence this semantic bias. Our findings demonstrate that commonsense and encyclopedic knowledge provide complementary benefits for multilingual semantic understanding, offering insights for developing NLP systems that capture nuanced semantics in low-resource languages.
Anthology ID:
2025.mrl-main.32
Volume:
Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025)
Month:
November
Year:
2025
Address:
Suzhuo, China
Editors:
David Ifeoluwa Adelani, Catherine Arnett, Duygu Ataman, Tyler A. Chang, Hila Gonen, Rahul Raja, Fabian Schmidt, David Stap, Jiayi Wang
Venues:
MRL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
483–495
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.mrl-main.32/
DOI:
Bibkey:
Cite (ACL):
Sello Ralethe and Jan Buys. 2025. Cross-Lingual Knowledge Augmentation for Mitigating Generic Overgeneralization in Multilingual Language Models. In Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025), pages 483–495, Suzhuo, China. Association for Computational Linguistics.
Cite (Informal):
Cross-Lingual Knowledge Augmentation for Mitigating Generic Overgeneralization in Multilingual Language Models (Ralethe & Buys, MRL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.mrl-main.32.pdf