SeaExam and SeaBench: Benchmarking LLMs with Local Multilingual Questions in Southeast Asia

Chaoqun Liu, Wenxuan Zhang, Jiahao Ying, Mahani Aljunied, Anh Tuan Luu, Lidong Bing


Abstract
This study introduces two novel benchmarks, SeaExam and SeaBench, designed to evaluate the capabilities of Large Language Models (LLMs) in Southeast Asian (SEA) application scenarios. Unlike existing multilingual datasets primarily derived from English translations, these benchmarks are constructed based on real-world scenarios from SEA regions. SeaExam draws from regional educational exams to form a comprehensive dataset that encompasses subjects such as local history and literature. In contrast, SeaBench is crafted around multi-turn, open-ended tasks that reflect daily interactions within SEA communities. Our evaluations demonstrate that SeaExam and SeaBench more effectively discern LLM performance on SEA language tasks compared to their translated benchmarks. This highlights the importance of using real-world queries to assess the multilingual capabilities of LLMs.
Anthology ID:
2025.findings-naacl.341
Volume:
Findings of the Association for Computational Linguistics: NAACL 2025
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6119–6136
Language:
URL:
https://preview.aclanthology.org/corrections-2025-06/2025.findings-naacl.341/
DOI:
10.18653/v1/2025.findings-naacl.341
Bibkey:
Cite (ACL):
Chaoqun Liu, Wenxuan Zhang, Jiahao Ying, Mahani Aljunied, Anh Tuan Luu, and Lidong Bing. 2025. SeaExam and SeaBench: Benchmarking LLMs with Local Multilingual Questions in Southeast Asia. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 6119–6136, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
SeaExam and SeaBench: Benchmarking LLMs with Local Multilingual Questions in Southeast Asia (Liu et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/corrections-2025-06/2025.findings-naacl.341.pdf