Enzo Ferrante


2025

pdf bib
Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation
Shivalika Singh | Angelika Romanou | Clémentine Fourrier | David Ifeoluwa Adelani | Jian Gang Ngui | Daniel Vila-Suero | Peerat Limkonchotiwat | Kelly Marchisio | Wei Qi Leong | Yosephine Susanto | Raymond Ng | Shayne Longpre | Sebastian Ruder | Wei-Yin Ko | Antoine Bosselut | Alice Oh | Andre Martins | Leshem Choshen | Daphne Ippolito | Enzo Ferrante | Marzieh Fadaee | Beyza Ermis | Sara Hooker
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Reliable multilingual evaluation is difficult, and culturally appropriate evaluation is even harder to achieve.A common practice to fill this gap is to machine-translate English evaluation sets. However, translation introduces language bias and carries over cultural and regional assumptions from the original questions – often testing knowledge irrelevant to the target audience. In this work, we highlight the extent and impact of these biases and present a multilingual evaluation framework that aims to mitigate them through improved translations and annotation practices.Through a large-scale study involving professional and community translators and annotators, we show that state-of-the-art models excel primarily by learning Western-centric concepts. Notably, we find that model rankings on the full MMLU change when evaluated on a subset of questions explicitly marked as culturally sensitive.We release Global MMLU, a multilingual extension of MMLU across 42 languages, featuring improved translation quality, expanded language coverage, and designated subsets labeled as culturally sensitive and culturally agnostic to enable a more comprehensive and equitable benchmark for evaluating language models across diverse linguistic and cultural contexts.