Michael Anugraha - ACL Anthology

This is an internal, incomplete preview of a proposed change to the ACL Anthology. For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes. Do not treat this content as an official publication.

Michael Anugraha

2025

pdf bib abs
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia
Samuel Cahyawijaya | Holy Lovenia | Joel Ruben Antony Moniz | Tack Hwa Wong | Mohammad Rifqi Farhansyah | Thant Thiri Maung | Frederikus Hudi | David Anugraha | Muhammad Ravi Shulthan Habibi | Muhammad Reza Qorib | Amit Agarwal | Joseph Marvin Imperial | Hitesh Laxmichand Patel | Vicky Feliren | Bahrul Ilmi Nasution | Manuel Antonio Rufino | Genta Indra Winata | Rian Adam Rajagede | Carlos Rafael Catalan | Mohamed Fazli Mohamed Imam | Priyaranjan Pattnayak | Salsabila Zahirah Pranida | Kevin Pratama | Yeshil Bangera | Adisai Na-Thalang | Patricia Nicole Monderin | Yueqi Song | Christian Simon | Lynnette Hui Xian Ng | Richardy Lobo Sapan | Taki Hasan Rafi | Bin Wang | Supryadi | Kanyakorn Veerakanjana | Piyalitt Ittichaiwong | Matthew Theodore Roque | Karissa Vincentio | Takdanai Kreangphet | Phakphum Artkaew | Kadek Hendrawan Palgunadi | Yanzhi Yu | Rochana Prih Hastuti | William Nixon | Mithil Bangera | Adrian Xuan Wei Lim | Aye Hninn Khine | Hanif Muhammad Zhafran | Teddy Ferdinan | Audra Aurora Izzani | Ayushman Singh | Evan Evan | Jauza Akbar Krito | Michael Anugraha | Fenal Ashokbhai Ilasariya | Haochen Li | John Amadeo Daniswara | Filbert Aurelian Tjiaranata | Eryawan Presma Yulianrifat | Can Udomcharoenchaikit | Fadil Risdian Ansori | Mahardika Krisna Ihsani | Giang Nguyen | Anab Maulana Barik | Dan John Velasco | Rifo Ahmad Genadi | Saptarshi Saha | Chengwei Wei | Isaiah Edri W. Flores | Kenneth Chen Ko Han | Anjela Gail D. Santos | Wan Shen Lim | Kaung Si Phyo | Tim Santos | Meisyarah Dwiastuti | Jiayun Luo | Jan Christian Blaise Cruz | Ming Shan Hee | Ikhlasul Akmal Hanif | M.Alif Al Hakim | Muhammad Rizky Sya’ban | Kun Kerdthaisong | Lester James Validad Miranda | Fajri Koto | Tirana Noor Fatyanosa | Alham Fikri Aji | Jostin Jerico Rosal | Jun Kevin | Robert Wijaya | Onno P. Kampman | Ruochen Zhang | Börje F. Karlsson | Peerat Limkonchotiwat
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Despite Southeast Asia’s (SEA) extraordinary linguistic and cultural diversity, the region remains significantly underrepresented in vision-language (VL) research, resulting in AI models that inadequately capture SEA cultural nuances. To fill this gap, we present SEA-VL, an open-source initiative dedicated to developing culturally relevant high-quality datasets for SEA languages. By involving contributors from SEA countries, SEA-VL ensures better cultural relevance and diversity, fostering greater inclusivity of underrepresented languages and cultural depictions in VL research. Our methodology employed three approaches: community-driven crowdsourcing with SEA contributors, automated image crawling, and synthetic image generation. We evaluated each method’s effectiveness in capturing cultural relevance. We found that image crawling achieves approximately ~85% cultural relevance while being more cost- and time-efficient than crowdsourcing, whereas synthetic image generation failed to accurately reflect SEA cultural nuances and contexts. Collectively, we gathered 1.28 million SEA culturally relevant images, more than 50 times larger than other existing datasets. This work bridges the representation gap in SEA, establishes a foundation for developing culturally aware AI systems for this region, and provides a replicable framework for addressing representation gaps in other underrepresented regions.

pdf bib abs
WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines
Genta Indra Winata | Frederikus Hudi | Patrick Amadeus Irawan | David Anugraha | Rifki Afina Putri | Wang Yutong | Adam Nohejl | Ubaidillah Ariq Prathama | Nedjma Ousidhoum | Afifa Amriani | Anar Rzayev | Anirban Das | Ashmari Pramodya | Aulia Adila | Bryan Wilie | Candy Olivia Mawalim | Cheng Ching Lam | Daud Abolade | Emmanuele Chersoni | Enrico Santus | Fariz Ikhwantri | Garry Kuwanto | Hanyang Zhao | Haryo Akbarianto Wibowo | Holy Lovenia | Jan Christian Blaise Cruz | Jan Wira Gotama Putra | Junho Myung | Lucky Susanto | Maria Angelica Riera Machin | Marina Zhukova | Michael Anugraha | Muhammad Farid Adilazuarda | Natasha Christabelle Santosa | Peerat Limkonchotiwat | Raj Dabre | Rio Alexander Audino | Samuel Cahyawijaya | Shi-Xiong Zhang | Stephanie Yulia Salim | Yi Zhou | Yinxuan Gui | David Ifeoluwa Adelani | En-Shiun Annie Lee | Shogo Okada | Ayu Purwarianti | Alham Fikri Aji | Taro Watanabe | Derry Tanti Wijaya | Alice Oh | Chong-Wah Ngo
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Vision Language Models (VLMs) often struggle with culture-specific knowledge, particularly in languages other than English and in underrepresented cultural contexts. To evaluate their understanding of such knowledge, we introduce WorldCuisines, a massive-scale benchmark for multilingual and multicultural, visually grounded language understanding. This benchmark includes a visual question answering (VQA) dataset with text-image pairs across 30 languages and dialects, spanning 9 language families and featuring over 1 million data points, making it the largest multicultural VQA benchmark to date. It includes tasks for identifying dish names and their origins. We provide evaluation datasets in two sizes (12k and 60k instances) alongside a training dataset (1 million instances). Our findings show that while VLMs perform better with correct location context, they struggle with adversarial contexts and predicting specific regional cuisines and languages. To support future research, we release a knowledge base with annotated food entries and images along with the VQA data.

Co-authors

Peerat Limkonchotiwat 2

Genta Indra Winata 2

David Ifeoluwa Adelani 1

Muhammad Farid Adilazuarda 1

Afifa Amriani 1

Fadil Risdian Ansori 1

Phakphum Artkaew 1

Rio Alexander Audino 1

Yeshil Bangera 1

Mithil Bangera 1

Anab Maulana Barik 1

Carlos Rafael Catalan 1

Emmanuele Chersoni 1

John Amadeo Daniswara 1

Meisyarah Dwiastuti 1

Mohammad Rifqi Farhansyah 1

Tirana Noor Fatyanosa 1

Vicky Feliren 1

Teddy Ferdinan 1

Isaiah Edri W. Flores 1

Rifo Ahmad Genadi 1

Muhammad Ravi Shulthan Habibi 1

M.Alif Al Hakim 1

Kenneth Chen Ko Han 1

Ikhlasul Akmal Hanif 1

Rochana Prih Hastuti 1

Ming Shan Hee 1

Mahardika Krisna Ihsani 1

Fariz Ikhwantri 1

Fenal Ashokbhai Ilasariya 1

Mohamed Fazli Mohamed Imam 1

Joseph Marvin Imperial 1

Patrick Amadeus Irawan 1

Piyalitt Ittichaiwong 1

Audra Aurora Izzani 1

Onno P. Kampman 1

Börje F. Karlsson 1

Kun Kerdthaisong 1

Aye Hninn Khine 1

Takdanai Kreangphet 1

Jauza Akbar Krito 1

Garry Kuwanto 1

Cheng Ching Lam 1

En-Shiun Annie Lee 1

Adrian Xuan Wei Lim 1

Thant Thiri Maung 1

Candy Olivia Mawalim 1

Lester James Validad Miranda 1

Patricia Nicole Monderin 1

Joel Ruben Antony Moniz 1

Adisai Na-Thalang 1

Bahrul Ilmi Nasution 1

Lynnette Hui Xian Ng 1

Chong-Wah Ngo 1

William Nixon 1

Nedjma Ousidhoum 1

Kadek Hendrawan Palgunadi 1

Hitesh Laxmichand Patel 1

Priyaranjan Pattnayak 1

Kaung Si Phyo 1

Ashmari Pramodya 1

Salsabila Zahirah Pranida 1

Kevin Pratama 1

Ubaidillah Ariq Prathama 1

Ayu Purwarianti 1

Jan Wira Gotama Putra 1

Rifki Afina Putri 1

Muhammad Reza Qorib 1

Taki Hasan Rafi 1

Rian Adam Rajagede 1

Maria Angelica Riera Machin 1

Matthew Theodore Roque 1

Jostin Jerico Rosal 1

Manuel Antonio Rufino 1

Saptarshi Saha 1

Stephanie Yulia Salim 1

Anjela Gail D. Santos 1

Natasha Christabelle Santosa 1

Enrico Santus 1

Richardy Lobo Sapan 1

Christian Simon 1

Ayushman Singh 1

Lucky Susanto 1

Muhammad Rizky Sya’ban 1

Filbert Aurelian Tjiaranata 1

Can Udomcharoenchaikit 1

Kanyakorn Veerakanjana 1

Dan John Velasco 1

Karissa Vincentio 1

Taro Watanabe 1

Haryo Akbarianto Wibowo 1

Robert Wijaya 1

Derry Tanti Wijaya 1

Tack Hwa Wong 1

Eryawan Presma Yulianrifat 1

Hanif Muhammad Zhafran 1

Ruochen Zhang 1

Shi-Xiong Zhang 1

Marina Zhukova 1

Venues

acl1
naacl1