Ashmari Pramodya


2025

pdf bib
WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines
Genta Indra Winata | Frederikus Hudi | Patrick Amadeus Irawan | David Anugraha | Rifki Afina Putri | Wang Yutong | Adam Nohejl | Ubaidillah Ariq Prathama | Nedjma Ousidhoum | Afifa Amriani | Anar Rzayev | Anirban Das | Ashmari Pramodya | Aulia Adila | Bryan Wilie | Candy Olivia Mawalim | Cheng Ching Lam | Daud Abolade | Emmanuele Chersoni | Enrico Santus | Fariz Ikhwantri | Garry Kuwanto | Hanyang Zhao | Haryo Akbarianto Wibowo | Holy Lovenia | Jan Christian Blaise Cruz | Jan Wira Gotama Putra | Junho Myung | Lucky Susanto | Maria Angelica Riera Machin | Marina Zhukova | Michael Anugraha | Muhammad Farid Adilazuarda | Natasha Christabelle Santosa | Peerat Limkonchotiwat | Raj Dabre | Rio Alexander Audino | Samuel Cahyawijaya | Shi-Xiong Zhang | Stephanie Yulia Salim | Yi Zhou | Yinxuan Gui | David Ifeoluwa Adelani | En-Shiun Annie Lee | Shogo Okada | Ayu Purwarianti | Alham Fikri Aji | Taro Watanabe | Derry Tanti Wijaya | Alice Oh | Chong-Wah Ngo
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Vision Language Models (VLMs) often struggle with culture-specific knowledge, particularly in languages other than English and in underrepresented cultural contexts. To evaluate their understanding of such knowledge, we introduce WorldCuisines, a massive-scale benchmark for multilingual and multicultural, visually grounded language understanding. This benchmark includes a visual question answering (VQA) dataset with text-image pairs across 30 languages and dialects, spanning 9 language families and featuring over 1 million data points, making it the largest multicultural VQA benchmark to date. It includes tasks for identifying dish names and their origins. We provide evaluation datasets in two sizes (12k and 60k instances) alongside a training dataset (1 million instances). Our findings show that while VLMs perform better with correct location context, they struggle with adversarial contexts and predicting specific regional cuisines and languages. To support future research, we release a knowledge base with annotated food entries and images along with the VQA data.

2023

pdf bib
Exploring Low-resource Neural Machine Translation for Sinhala-Tamil Language Pair
Ashmari Pramodya
Proceedings of the 8th Student Research Workshop associated with the International Conference Recent Advances in Natural Language Processing

At present, Neural Machine Translation is a promising approach for machine translation. Transformer-based deep learning architectures in particular show a substantial performance increase in translating between various language pairs. However, many low-resource language pairs still struggle to lend themselves to Neural Machine Translation due to their data-hungry nature. In this article, we investigate methods of expanding the parallel corpus to enhance translation quality within a model training pipeline, starting from the initial collection of parallel data to the training process of baseline models. Grounded on state-of-the-art Neural Machine Translation approaches such as hyper-parameter tuning, and data augmentation with forward and backward translation, we define a set of best practices for improving Tamil-to-Sinhala machine translation and empirically validate our methods using standard evaluation metrics. Our results demonstrate that the Neural Machine Translation models trained on larger amounts of back-translated data outperform other synthetic data generation approaches in Transformer base training settings. We further demonstrate that, even for language pairs with limited resources, Transformer models are able to tune to outperform existing state-of-the-art Statistical Machine Translation models by as much as 3.28 BLEU points in the Tamil to Sinhala translation scenarios.