Melika Nobakhtian
2026
CommonLID: Re-evaluating State-of-the-Art Language Identification Performance on Web Data
Pedro Ortiz Suarez | Laurie Burchell | Catherine Arnett | Rafael Mosquera | Sara Hincapi\'e Monsalve | Thom Vaughan | Damian Stewart | Malte Ostendorff | Idris Abdulmumin | Vukosi Marivate | Shamsuddeen Hassan Muhammad | Atnafu Lambebo Tonja | Hend Al-Khalifa | Nadia Ghezaiel Hammouda | Verrah Akinyi Otiende | Tack Hwa Wong | Jakhongir Saydaliev | Melika Nobakhtian | Muhammad Ravi Shulthan Habibi | Chalamalasetti Kranti | Carol Muchemi | Khang Nguyen | Faisal Muhammad Adam | Luis Frentzen Salim | Reem Alqifari | Cynthia Jayne Amol | Joseph Marvin Imperial | Ilker Kesen | Ahmad Mustafid | Pavel Stepachev | Leshem Choshen | David Anugraha | Hamada Nayel | Seid Muhie Yimam | Vallerie Alexandra Putra | My Chiffon Nguyen | Azmine Toushik Wasi | Gouthami Vadithya | Rob Van Der Goot | Lanwenn ar C'horr | Karan Dua | Andrew Yates | Mithil Bangera | Yeshil Bangera | Hitesh Laxmichand Patel | Shu Okabe | Fenal Ashokbhai Ilasariya | Dmitry Gaynullin | Genta Indra Winata | Yiyuan Li | Juan Pablo Mart{\'\i}nez | Amit Agarwal | Ikhlasul Akmal Hanif | Raia Abu Ahmad | Esther Adenuga | Filbert Aurelian Tjiaranata | Weerayut Buaphet | Michael Anugraha | Sowmya Vajjala | Benjamin L Rice | Azril Hafizi Amirudin | Jesujoba Oluwadara Alabi | Srikant Panda | Yassine Toughrai | Bruhan Kyomuhendo | Daniel Ruffinelli | Akshata | Manuel Goul\~ao | Ej Zhou | Ingrid Gabriela Franco Ramirez | Cristina Aggazzotti | Konstantin Dobler | Jun Kevin | Quentin Pag\`es | Nicholas Andrews | Nuhu Ibrahim | Mattes Ruckdeschel | Amr Keleg | Mike Zhang | Casper Rufaro Muziri | Saron Samuel | Sotaro Takeshita | Kun Kerdthaisong | Luca Foppiano | Rasul Dent | Tommaso Green | Ahmad Mustapha Wali | Kamohelo Makaaka | Vicky Feliren | Inshirah Idris | Hande Celikkanat | Abdulhamid Abubakar | Jean Maillard | Beno{\^\i}t Sagot | Thibault Cl\'erice | Kenton Murray | Sarah K. K. Luger
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Pedro Ortiz Suarez | Laurie Burchell | Catherine Arnett | Rafael Mosquera | Sara Hincapi\'e Monsalve | Thom Vaughan | Damian Stewart | Malte Ostendorff | Idris Abdulmumin | Vukosi Marivate | Shamsuddeen Hassan Muhammad | Atnafu Lambebo Tonja | Hend Al-Khalifa | Nadia Ghezaiel Hammouda | Verrah Akinyi Otiende | Tack Hwa Wong | Jakhongir Saydaliev | Melika Nobakhtian | Muhammad Ravi Shulthan Habibi | Chalamalasetti Kranti | Carol Muchemi | Khang Nguyen | Faisal Muhammad Adam | Luis Frentzen Salim | Reem Alqifari | Cynthia Jayne Amol | Joseph Marvin Imperial | Ilker Kesen | Ahmad Mustafid | Pavel Stepachev | Leshem Choshen | David Anugraha | Hamada Nayel | Seid Muhie Yimam | Vallerie Alexandra Putra | My Chiffon Nguyen | Azmine Toushik Wasi | Gouthami Vadithya | Rob Van Der Goot | Lanwenn ar C'horr | Karan Dua | Andrew Yates | Mithil Bangera | Yeshil Bangera | Hitesh Laxmichand Patel | Shu Okabe | Fenal Ashokbhai Ilasariya | Dmitry Gaynullin | Genta Indra Winata | Yiyuan Li | Juan Pablo Mart{\'\i}nez | Amit Agarwal | Ikhlasul Akmal Hanif | Raia Abu Ahmad | Esther Adenuga | Filbert Aurelian Tjiaranata | Weerayut Buaphet | Michael Anugraha | Sowmya Vajjala | Benjamin L Rice | Azril Hafizi Amirudin | Jesujoba Oluwadara Alabi | Srikant Panda | Yassine Toughrai | Bruhan Kyomuhendo | Daniel Ruffinelli | Akshata | Manuel Goul\~ao | Ej Zhou | Ingrid Gabriela Franco Ramirez | Cristina Aggazzotti | Konstantin Dobler | Jun Kevin | Quentin Pag\`es | Nicholas Andrews | Nuhu Ibrahim | Mattes Ruckdeschel | Amr Keleg | Mike Zhang | Casper Rufaro Muziri | Saron Samuel | Sotaro Takeshita | Kun Kerdthaisong | Luca Foppiano | Rasul Dent | Tommaso Green | Ahmad Mustapha Wali | Kamohelo Makaaka | Vicky Feliren | Inshirah Idris | Hande Celikkanat | Abdulhamid Abubakar | Jean Maillard | Beno{\^\i}t Sagot | Thibault Cl\'erice | Kenton Murray | Sarah K. K. Luger
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Language identification (LID) is a fundamental step in curating multilingual corpora. However, LID models still perform poorly for many languages, especially on the noisy and heterogeneous web data often used to train multilingual language models. In this paper, we introduce CommonLID, a community-driven, human-annotated LID benchmark for the web domain, covering 109 languages. Many of the included languages have been previously under-served, making CommonLID a key resource for developing more representative high-quality text corpora. We show CommonLID’s value by using it, alongside five other common evaluation sets, to test eight popular LID models. We analyse our results to situate our contribution and to provide an overview of the state of the art. In particular, we highlight that existing evaluations overestimate LID accuracy for many languages in the web domain. We make CommonLID and the code used to create it available under an open, permissive license.
2025
Evaluating Cultural Knowledge and Reasoning in LLMs Through Persian Allusions
Melika Nobakhtian | Yadollah Yaghoobzadeh | Mohammad Taher Pilehvar
Findings of the Association for Computational Linguistics: EMNLP 2025
Melika Nobakhtian | Yadollah Yaghoobzadeh | Mohammad Taher Pilehvar
Findings of the Association for Computational Linguistics: EMNLP 2025
Allusion recognition—a task demanding contextual activation of cultural knowledge—serves as a critical test of LLMs’ ability to deploy stored information in open-ended, figurative settings. We introduce a framework for evaluating Persian literary allusions through (1) classical poetry annotations and (2) LLM-generated texts incorporating allusions in novel contexts. By combining knowledge assessments, multiple-choice tasks, and open-ended recognition, we analyze whether failures stem from knowledge gaps or activation challenges. Evaluations across eleven LLMs highlight a notable observation: models exhibit strong foundational knowledge and high multiple-choice accuracy, yet performance drops substantially in open-ended tasks, especially for indirect references. Reasoning-optimized models generalize better to novel contexts, whereas distilled models show marked degradation in cultural reasoning. The gap underscores that LLMs’ limitations arise not from missing knowledge but from difficulties in spontaneously activating cultural references without explicit cues. We propose allusion recognition as a benchmark for contextual knowledge deployment, highlighting the need for training paradigms that bridge factual recall and culturally grounded reasoning. Our code, datasets and results are available at https://github.com/MelikaNobakhtian/Allusion
2023
IUST at ImageArg: The First Shared Task in Multimodal Argument Mining
Melika Nobakhtian | Ghazal Zamaninejad | Erfan Moosavi Monazzah | Sauleh Eetemadi
Proceedings of the 10th Workshop on Argument Mining
Melika Nobakhtian | Ghazal Zamaninejad | Erfan Moosavi Monazzah | Sauleh Eetemadi
Proceedings of the 10th Workshop on Argument Mining
ImageArg is a shared task at the 10th ArgMining Workshop at EMNLP 2023. It leverages the ImageArg dataset to advance multimodal persuasiveness techniques. This challenge comprises two distinct subtasks: 1) Argumentative Stance (AS) Classification: Assessing whether a given tweet adopts an argumentative stance. 2) Image Persuasiveness (IP) Classification: Determining if the tweet image enhances the persuasive quality of the tweet. We conducted various experiments on both subtasks and ranked sixth out of the nine participating teams.
Search
Fix author
Co-authors
- Idris Abdulmumin 1
- Abdulhamid Abubakar 1
- Faisal Muhammad Adam 1
- Esther Adenuga 1
- Amit Agarwal 1
- Cristina Aggazzotti 1
- Raia Abu Ahmad 1
- Akshata 1
- Hend Al-Khalifa 1
- Jesujoba Alabi 1
- Reem Alqifari 1
- Azril Hafizi Amirudin 1
- Cynthia Jayne Amol 1
- Nicholas Andrews 1
- David Anugraha 1
- Michael Anugraha 1
- Catherine Arnett 1
- Mithil Bangera 1
- Yeshil Bangera 1
- Weerayut Buaphet 1
- Laurie Burchell 1
- Lanwenn ar C'horr 1
- Hande Celikkanat 1
- Kranti Chalamalasetti 1
- Leshem Choshen 1
- Thibault Cl\'erice 1
- Rasul Dent 1
- Konstantin Dobler 1
- Karan Dua 1
- Sauleh Eetemadi 1
- Vicky Feliren 1
- Luca Foppiano 1
- Dmitry Gaynullin 1
- Manuel Goul\~ao 1
- Tommaso Green 1
- Muhammad Ravi Shulthan Habibi 1
- Nadia Ghezaiel Hammouda 1
- Ikhlasul Akmal Hanif 1
- Nuhu Ibrahim 1
- Inshirah Idris 1
- Fenal Ashokbhai Ilasariya 1
- Joseph Marvin Imperial 1
- Amr Keleg 1
- Kun Kerdthaisong 1
- Ilker Kesen 1
- Jun Kevin 1
- Bruhan Kyomuhendo 1
- Yiyuan Li 1
- Sarah K. K. Luger 1
- Jean Maillard 1
- Kamohelo Makaaka 1
- Vukosi Marivate 1
- Juan Pablo Martínez 1
- Erfan Moosavi Monazzah 1
- Sara Hincapi\'e Monsalve 1
- Rafael Mosquera 1
- Carol Muchemi 1
- Shamsuddeen Hassan Muhammad 1
- Kenton Murray 1
- Ahmad Mustafid 1
- Casper Rufaro Muziri 1
- Hamada Nayel 1
- Khang Nguyen 1
- My Chiffon Nguyen 1
- Shu Okabe 1
- Pedro Ortiz Suarez 1
- Malte Ostendorff 1
- Verrah Akinyi Otiende 1
- Quentin Pag\`es 1
- Srikant Panda 1
- Hitesh Laxmichand Patel 1
- Mohammad Taher Pilehvar 1
- Vallerie Alexandra Putra 1
- Ingrid Gabriela Franco Ramirez 1
- Benjamin L Rice 1
- Mattes Ruckdeschel 1
- Daniel Ruffinelli 1
- Benoît Sagot 1
- Luis Frentzen Salim 1
- Saron Samuel 1
- Jakhongir Saydaliev 1
- Pavel Stepachev 1
- Damian Stewart 1
- Sotaro Takeshita 1
- Filbert Aurelian Tjiaranata 1
- Atnafu Lambebo Tonja 1
- Yassine Toughrai 1
- Gouthami Vadithya 1
- Sowmya Vajjala 1
- Rob Van Der Goot 1
- Thom Vaughan 1
- Ahmad Mustapha Wali 1
- Azmine Toushik Wasi 1
- Genta Indra Winata 1
- Tack Hwa Wong 1
- Yadollah Yaghoobzadeh 1
- Andrew Yates 1
- Seid Muhie Yimam 1
- Ghazal Zamaninejad 1
- Mike Zhang 1
- Ej Zhou 1