Yosephine Susanto


2025

pdf bib
Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation
Shivalika Singh | Angelika Romanou | Clémentine Fourrier | David Ifeoluwa Adelani | Jian Gang Ngui | Daniel Vila-Suero | Peerat Limkonchotiwat | Kelly Marchisio | Wei Qi Leong | Yosephine Susanto | Raymond Ng | Shayne Longpre | Sebastian Ruder | Wei-Yin Ko | Antoine Bosselut | Alice Oh | Andre Martins | Leshem Choshen | Daphne Ippolito | Enzo Ferrante | Marzieh Fadaee | Beyza Ermis | Sara Hooker
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Reliable multilingual evaluation is difficult, and culturally appropriate evaluation is even harder to achieve.A common practice to fill this gap is to machine-translate English evaluation sets. However, translation introduces language bias and carries over cultural and regional assumptions from the original questions – often testing knowledge irrelevant to the target audience. In this work, we highlight the extent and impact of these biases and present a multilingual evaluation framework that aims to mitigate them through improved translations and annotation practices.Through a large-scale study involving professional and community translators and annotators, we show that state-of-the-art models excel primarily by learning Western-centric concepts. Notably, we find that model rankings on the full MMLU change when evaluated on a subset of questions explicitly marked as culturally sensitive.We release Global MMLU, a multilingual extension of MMLU across 42 languages, featuring improved translation quality, expanded language coverage, and designated subsets labeled as culturally sensitive and culturally agnostic to enable a more comprehensive and equitable benchmark for evaluating language models across diverse linguistic and cultural contexts.

pdf bib
SEA-HELM: Southeast Asian Holistic Evaluation of Language Models
Yosephine Susanto | Adithya Venkatadri Hulagadri | Jann Railey Montalan | Jian Gang Ngui | Xianbin Yong | Wei Qi Leong | Hamsawardhini Rengarajan | Peerat Limkonchotiwat | Yifan Mai | William Chandra Tjhi
Findings of the Association for Computational Linguistics: ACL 2025

With the rapid emergence of novel capabilities in Large Language Models (LLMs), the need for rigorous multilingual and multiculturalbenchmarks that are integrated has become more pronounced. Though existing LLM benchmarks are capable of evaluating specificcapabilities of LLMs in English as well as in various mid- to low-resource languages, including those in the Southeast Asian (SEA)region, a comprehensive and culturally representative evaluation suite for the SEA languages has not been developed thus far.Here, we present SEA-HELM, a holistic linguistic and cultural LLM evaluation suite that emphasises SEA languages, comprisingfive core pillars: (1) NLP CLASSICS, (2) LLM-SPECIFICS, (3) SEA LINGUISTICS, (4) SEA CULTURE, (5) SAFETY. SEA-HELMcurrently supports Filipino, Indonesian, Tamil, Thai, and Vietnamese. We also introduce the SEA-HELM leaderboard, which allows users to understand models’ multilingual and multicultural performance in a systematic and user-friendly manner. We make the SEA-HELM evaluation code publicly available.

pdf bib
SEA-LION: Southeast Asian Languages in One Network
Raymond Ng | Thanh Ngan Nguyen | Huang Yuli | Tai Ngee Chia | Leong Wai Yi | Wei Qi Leong | Xianbin Yong | Jian Gang Ngui | Yosephine Susanto | Nicholas Cheng | Hamsawardhini Rengarajan | Peerat Limkonchotiwat | Adithya Venkatadri Hulagadri | Kok Wai Teng | Yeo Yeow Tong | Bryan Siow | Wei Yi Teo | Tan Choon Meng | Brandon Ong | Zhi Hao Ong | Jann Railey Montalan | Adwin Chan | Sajeban Antonyrex | Ren Lee | Esther Choa | David Ong Tat-Wee | Bing Jie Darius Liu | William Chandra Tjhi | Erik Cambria | Leslie Teo
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

Recently, Large Language Models (LLMs) have dominated much of the artificial intelligence scene with their ability to process and generate natural languages. However, the majority of LLM research and development remains English-centric, leaving low-resource languages such as those in the Southeast Asian (SEA) region under-represented. To address this representation gap, we introduce Llama-SEA-LION-v3-8B-IT and Gemma-SEA-LION-v3-9B-IT, two cutting-edge multilingual LLMs designed for SEA languages. The SEA-LION family of LLMs supports 11 SEA languages, namely English, Chinese, Indonesian, Vietnamese, Malay, Thai, Burmese, Lao, Filipino, Tamil, and Khmer. Our work leverages large-scale multilingual continued pre-training with a comprehensive post-training regime involving multiple stages of instruction fine-tuning, alignment, and model merging. Evaluation results on multilingual benchmarks indicate that our models achieve state-of-the-art performance across LLMs supporting SEA languages. We open-source the models to benefit the wider SEA community.

2024

pdf bib
Kalahi: A handcrafted, grassroots cultural LLM evaluation suite for Filipino
Jann Railey Montalan | Jian Gang Ngui | Wei Qi Leong | Yosephine Susanto | Hamsawardhini Rengarajan | Alham Fikri Aji | William Chandra Tjhi
Proceedings of the 38th Pacific Asia Conference on Language, Information and Computation