LAG-MMLU: Benchmarking Frontier LLM Understanding in Latvian and Giriama

Naome A. Etori, Arturs Kanepajs, Kevin Lu, Randu Karisa


Abstract
This paper evaluates the language understanding capabilities of various large language models (LLMs) through an analysis of 112 translated and human-edited questions from the Multitask Language Understanding (MMLU) dataset, focusing specifically on two underrepresented languages: Latvian and Giriama. The study compares the performance of six state-of-the-art (SOTA) models, with OpenAI’s o1-preview model demonstrating superior performance across all languages, significantly outperforming non-proprietary models in Latvian and all other models in Giriama. Human editing of automated translations from English to Latvian yielded only a small, statistically insignificant improvement in performance estimates, suggesting that machine-translated benchmarks may be sufficient for comparing model performance in languages with established digital resources like Latvian. However, automated translation to Giriama proved infeasible, and model performance in Giriama remained poor, highlighting the persistent challenges LLMs face with low-resource languages. These findings underscore the need for more comprehensive datasets and improved machine translation capabilities for underrepresented languages, while emphasizing the importance of localized benchmarks and human evaluation in addressing cultural and contextual limitations in AI models.
Anthology ID:
2025.nodalida-1.12
Volume:
Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)
Month:
march
Year:
2025
Address:
Tallinn, Estonia
Editors:
Richard Johansson, Sara Stymne
Venue:
NoDaLiDa
SIG:
Publisher:
University of Tartu Library
Note:
Pages:
109–120
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.nodalida-1.12/
DOI:
Bibkey:
Cite (ACL):
Naome A. Etori, Arturs Kanepajs, Kevin Lu, and Randu Karisa. 2025. LAG-MMLU: Benchmarking Frontier LLM Understanding in Latvian and Giriama. In Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025), pages 109–120, Tallinn, Estonia. University of Tartu Library.
Cite (Informal):
LAG-MMLU: Benchmarking Frontier LLM Understanding in Latvian and Giriama (Etori et al., NoDaLiDa 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.nodalida-1.12.pdf