Emilio Villa-Cueva
Also published as: Emilio Villa Cueva, Emilio Villa Cueva
2026
Afri-MCQA: Multimodal Cultural Question Answering for African Languages
Atnafu Lambebo Tonja | Srija Anand | Emilio Villa-Cueva | Israel Abebe Azime | Jesujoba Oluwadara Alabi | Muhidin A. Mohamed | Debela Desalegn Yadeta | Negasi Haile Abadi | Abigail Oppong | Nnaemeka Casmir Obiefuna | Idris Abdulmumin | Naome A Etori | Eric Peter Wairagala | Kanda Patrick Tshinu | Imanigirimbabazi Emmanuel | Gabofetswe Malema | Alham Fikri Aji | David Ifeoluwa Adelani | Thamar Solorio
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Atnafu Lambebo Tonja | Srija Anand | Emilio Villa-Cueva | Israel Abebe Azime | Jesujoba Oluwadara Alabi | Muhidin A. Mohamed | Debela Desalegn Yadeta | Negasi Haile Abadi | Abigail Oppong | Nnaemeka Casmir Obiefuna | Idris Abdulmumin | Naome A Etori | Eric Peter Wairagala | Kanda Patrick Tshinu | Imanigirimbabazi Emmanuel | Gabofetswe Malema | Alham Fikri Aji | David Ifeoluwa Adelani | Thamar Solorio
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Africa is home to over one-third of the world’s languages, yet remains severely underrepresented in multimodal AI research. We introduce Afri-MCQA, the first Multilingual Cultural Question-Answering benchmark containing 7.5k Q A pairs across 15 African languages from 12 countries. The benchmark offers parallel text and speech modalities and was entirely created by native speakers. We find that models show poor performance across evaluated cultures, with near-zero accuracy on open-ended VQA when queried through native language or speech. To test linguistic competence, we include control experiments meant to assess this specific aspect separate from cultural knowledge, and we observe significant performance gaps between native languages and English for both text and speech. These findings underscore the pressing need for speech-first approaches, culturally grounded pretraining, and cross-lingual cultural transfer. We release Afri-MCQA to support more inclusive multimodal AI development.
2025
SHADES: Towards a Multilingual Assessment of Stereotypes in Large Language Models
Margaret Mitchell | Giuseppe Attanasio | Ioana Baldini | Miruna Clinciu | Jordan Clive | Pieter Delobelle | Manan Dey | Sil Hamilton | Timm Dill | Jad Doughman | Ritam Dutt | Avijit Ghosh | Jessica Zosa Forde | Carolin Holtermann | Lucie-Aimée Kaffee | Tanmay Laud | Anne Lauscher | Roberto L Lopez-Davila | Maraim Masoud | Nikita Nangia | Anaelia Ovalle | Giada Pistilli | Dragomir Radev | Beatrice Savoldi | Vipul Raheja | Jeremy Qin | Esther Ploeger | Arjun Subramonian | Kaustubh Dhole | Kaiser Sun | Amirbek Djanibekov | Jonibek Mansurov | Kayo Yin | Emilio Villa Cueva | Sagnik Mukherjee | Jerry Huang | Xudong Shen | Jay Gala | Hamdan Al-Ali | Tair Djanibekov | Nurdaulet Mukhituly | Shangrui Nie | Shanya Sharma | Karolina Stanczak | Eliza Szczechla | Tiago Timponi Torrent | Deepak Tunuguntla | Marcelo Viridiano | Oskar Van Der Wal | Adina Yakefu | Aurélie Névéol | Mike Zhang | Sydney Zink | Zeerak Talat
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Margaret Mitchell | Giuseppe Attanasio | Ioana Baldini | Miruna Clinciu | Jordan Clive | Pieter Delobelle | Manan Dey | Sil Hamilton | Timm Dill | Jad Doughman | Ritam Dutt | Avijit Ghosh | Jessica Zosa Forde | Carolin Holtermann | Lucie-Aimée Kaffee | Tanmay Laud | Anne Lauscher | Roberto L Lopez-Davila | Maraim Masoud | Nikita Nangia | Anaelia Ovalle | Giada Pistilli | Dragomir Radev | Beatrice Savoldi | Vipul Raheja | Jeremy Qin | Esther Ploeger | Arjun Subramonian | Kaustubh Dhole | Kaiser Sun | Amirbek Djanibekov | Jonibek Mansurov | Kayo Yin | Emilio Villa Cueva | Sagnik Mukherjee | Jerry Huang | Xudong Shen | Jay Gala | Hamdan Al-Ali | Tair Djanibekov | Nurdaulet Mukhituly | Shangrui Nie | Shanya Sharma | Karolina Stanczak | Eliza Szczechla | Tiago Timponi Torrent | Deepak Tunuguntla | Marcelo Viridiano | Oskar Van Der Wal | Adina Yakefu | Aurélie Névéol | Mike Zhang | Sydney Zink | Zeerak Talat
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Large Language Models (LLMs) reproduce and exacerbate the social biases present in their training data, and resources to quantify this issue are limited. While research has attempted to identify and mitigate such biases, most efforts have been concentrated around English, lagging the rapid advancement of LLMs in multilingual settings. In this paper, we introduce a new multilingual parallel dataset SHADES to help address this issue, designed for examining culturally-specific stereotypes that may be learned by LLMs. The dataset includes stereotypes from 20 regions around the world and 16 languages, spanning multiple identity categories subject to discrimination worldwide. We demonstrate its utility in a series of exploratory evaluations for both “base” and “instruction-tuned” language models. Our results suggest that stereotypes are consistently reflected across models and languages, with some languages and models indicating much stronger stereotype biases than others.
MoMentS: A Comprehensive Multimodal Benchmark for Theory of Mind
Emilio Villa-Cueva | S M Masrur Ahmed | Rendi Chevi | Jan Christian Blaise Cruz | Kareem Elzeky | Fermin Cristobal | Alham Fikri Aji | Skyler Wang | Rada Mihalcea | Thamar Solorio
Findings of the Association for Computational Linguistics: EMNLP 2025
Emilio Villa-Cueva | S M Masrur Ahmed | Rendi Chevi | Jan Christian Blaise Cruz | Kareem Elzeky | Fermin Cristobal | Alham Fikri Aji | Skyler Wang | Rada Mihalcea | Thamar Solorio
Findings of the Association for Computational Linguistics: EMNLP 2025
Understanding Theory of Mind is essential for building socially intelligent multimodal agents capable of perceiving and interpreting human behavior. We introduce MoMentS (Multimodal Mental States), a comprehensive benchmark designed to assess the ToM capabilities of multimodal large language models (LLMs) through realistic, narrative-rich scenarios presented in short films. MoMentS includes over 2,300 multiple-choice questions spanning seven distinct ToM categories. The benchmark features long video context windows and realistic social interactions that provide deeper insight into characters’ mental states. We evaluate several MLLMs and find that although vision generally improves performance, models still struggle to integrate it effectively. For audio, models that process dialogues as audio do not consistently outperform transcript-based inputs. Our findings highlight the need to improve multimodal integration and point to open challenges that must be addressed to advance AI’s social understanding.
CaMMT: Benchmarking Culturally Aware Multimodal Machine Translation
Emilio Villa-Cueva | Sholpan Bolatzhanova | Diana Turmakhan | Kareem Elzeky | Henok Biadglign Ademtew | Alham Fikri Aji | Vladimir Araujo | Israel Abebe Azime | Jinheon Baek | Frederico Belcavello | Fermin Cristobal | Jan Christian Blaise Cruz | Mary Dabre | Raj Dabre | Toqeer Ehsan | Naome A Etori | Fauzan Farooqui | Jiahui Geng | Guido Ivetta | Thanmay Jayakumar | Soyeong Jeong | Zheng Wei Lim | Aishik Mandal | Sofía Martinelli | Mihail Minkov Mihaylov | Daniil Orel | Aniket Pramanick | Sukannya Purkayastha | Israfel Salazar | Haiyue Song | Tiago Timponi Torrent | Debela Desalegn Yadeta | Injy Hamed | Atnafu Lambebo Tonja | Thamar Solorio
Findings of the Association for Computational Linguistics: EMNLP 2025
Emilio Villa-Cueva | Sholpan Bolatzhanova | Diana Turmakhan | Kareem Elzeky | Henok Biadglign Ademtew | Alham Fikri Aji | Vladimir Araujo | Israel Abebe Azime | Jinheon Baek | Frederico Belcavello | Fermin Cristobal | Jan Christian Blaise Cruz | Mary Dabre | Raj Dabre | Toqeer Ehsan | Naome A Etori | Fauzan Farooqui | Jiahui Geng | Guido Ivetta | Thanmay Jayakumar | Soyeong Jeong | Zheng Wei Lim | Aishik Mandal | Sofía Martinelli | Mihail Minkov Mihaylov | Daniil Orel | Aniket Pramanick | Sukannya Purkayastha | Israfel Salazar | Haiyue Song | Tiago Timponi Torrent | Debela Desalegn Yadeta | Injy Hamed | Atnafu Lambebo Tonja | Thamar Solorio
Findings of the Association for Computational Linguistics: EMNLP 2025
Translating cultural content poses challenges for machine translation systems due to the differences in conceptualizations between cultures, where language alone may fail to convey sufficient context to capture region-specific meanings. In this work, we investigate whether images can act as cultural context in multimodal translation. We introduce CaMMT, a human-curated benchmark of over 5,800 triples of images along with parallel captions in English and regional languages. Using this dataset, we evaluate five Vision Language Models (VLMs) in text-only and text+image settings. Through automatic and human evaluations, we find that visual context generally improves translation quality, especially in handling Culturally-Specific Items (CSIs), disambiguation, and correct gender marking. By releasing CaMMT, our objective is to support broader efforts to build and evaluate multimodal translation systems that are better aligned with cultural nuance and regional variations.
2023
Walter Burns at SemEval-2023 Task 5: NLP-CIMAT - Leveraging Model Ensembles for Clickbait Spoiling
Emilio Villa Cueva | Daniel Vallejo Aldana | Fernando Sánchez Vega | Adrián Pastor López Monroy
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
Emilio Villa Cueva | Daniel Vallejo Aldana | Fernando Sánchez Vega | Adrián Pastor López Monroy
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
This paper describes our participation in the Clickbait challenge at SemEval 2023. In this work, we address the Clickbait classification task using transformers models in an ensemble configuration. We tackle the Spoiler Generation task using a two-level ensemble strategy of models trained for extractive QA, and selecting the best K candidates for multi-part spoilers. In the test partitions, our approaches obtained a classification accuracy of 0.716 for classification and a BLEU-4 score of 0.439 for spoiler generation.
Search
Fix author
Co-authors
- Alham Fikri Aji 3
- Thamar Solorio 3
- Israel Abebe Azime 2
- Fermin Cristobal 2
- Jan Christian Blaise Cruz 2
- Kareem Elzeky 2
- Naome A. Etori 2
- Atnafu Lambebo Tonja 2
- Tiago Timponi Torrent 2
- Debela Desalegn Yadeta 2
- Negasi Haile Abadi 1
- Idris Abdulmumin 1
- David Ifeoluwa Adelani 1
- Henok Biadglign Ademtew 1
- S M Masrur Ahmed 1
- Hamdan Al-Ali 1
- Jesujoba Alabi 1
- Srija Anand 1
- Vladimir Araujo 1
- Giuseppe Attanasio 1
- Jinheon Baek 1
- Ioana Baldini 1
- Frederico Belcavello 1
- Sholpan Bolatzhanova 1
- Rendi Chevi 1
- Miruna Clinciu 1
- Jordan Clive 1
- Mary Dabre 1
- Raj Dabre 1
- Pieter Delobelle 1
- Manan Dey 1
- Kaustubh Dhole 1
- Timm Dill 1
- Amirbek Djanibekov 1
- Jad Doughman 1
- Ritam Dutt 1
- Toqeer Ehsan 1
- Imanigirimbabazi Emmanuel 1
- Fauzan Farooqui 1
- Jessica Zosa Forde 1
- Jay Gala 1
- Jiahui Geng 1
- Avijit Ghosh 1
- Injy Hamed 1
- Sil Hamilton 1
- Carolin Holtermann 1
- Jerry Huang 1
- Guido Ivetta 1
- Thanmay Jayakumar 1
- Soyeong Jeong 1
- Lucie-Aimée Kaffee 1
- Tanmay Laud 1
- Anne Lauscher 1
- Zheng Wei Lim 1
- Adrian Pastor Lopez Monroy 1
- Roberto L Lopez-Davila 1
- Gabofetswe Malema 1
- Aishik Mandal 1
- Jonibek Mansurov 1
- Sofía Martinelli 1
- Maraim Masoud 1
- Rada Mihalcea 1
- Mihail Minkov Mihaylov 1
- Margaret Mitchell 1
- Muhidin A. Mohamed 1
- Sagnik Mukherjee 1
- Nurdaulet Mukhituly 1
- Nikita Nangia 1
- Aurelie Neveol 1
- Shangrui Nie 1
- Nnaemeka Casmir Obiefuna 1
- Abigail Oppong 1
- Daniil Orel 1
- Anaelia Ovalle 1
- Giada Pistilli 1
- Esther Ploeger 1
- Aniket Pramanick 1
- Sukannya Purkayastha 1
- Jeremy Qin 1
- Dragomir Radev 1
- Vipul Raheja 1
- Israfel Salazar 1
- Fernando Sanchez-Vega 1
- Beatrice Savoldi 1
- Shanya Sharma 1
- Xudong Shen 1
- Haiyue Song 1
- Karolina Stanczak 1
- Arjun Subramonian 1
- Kaiser Sun 1
- Eliza Szczechla 1
- Tair Djanibekov 1
- Zeerak Talat 1
- Kanda Patrick Tshinu 1
- Deepak Tunuguntla 1
- Diana Turmakhan 1
- Daniel Vallejo Aldana 1
- Oskar Van Der Wal 1
- Marcelo Viridiano 1
- Eric Peter Wairagala 1
- Skyler Wang 1
- Adina Yakefu 1
- Kayo Yin 1
- Mike Zhang 1
- Sydney Zink 1