Hanif Muhammad Zhafran - ACL Anthology

This is an internal, incomplete preview of a proposed change to the ACL Anthology. For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes. Do not treat this content as an official publication.

Hanif Muhammad Zhafran

2025

pdf bib abs
BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages
Shamsuddeen Hassan Muhammad | Nedjma Ousidhoum | Idris Abdulmumin | Jan Philip Wahle | Terry Ruas | Meriem Beloucif | Christine de Kock | Nirmal Surange | Daniela Teodorescu | Ibrahim Said Ahmad | David Ifeoluwa Adelani | Alham Fikri Aji | Felermino D. M. A. Ali | Ilseyar Alimova | Vladimir Araujo | Nikolay Babakov | Naomi Baes | Ana-Maria Bucur | Andiswa Bukula | Guanqun Cao | Rodrigo Tufiño | Rendi Chevi | Chiamaka Ijeoma Chukwuneke | Alexandra Ciobotaru | Daryna Dementieva | Murja Sani Gadanya | Robert Geislinger | Bela Gipp | Oumaima Hourrane | Oana Ignat | Falalu Ibrahim Lawan | Rooweither Mabuya | Rahmad Mahendra | Vukosi Marivate | Alexander Panchenko | Andrew Piper | Charles Henrique Porto Ferreira | Vitaly Protasov | Samuel Rutunda | Manish Shrivastava | Aura Cristina Udrea | Lilian Diana Awuor Wanzare | Sophie Wu | Florian Valentin Wunderlich | Hanif Muhammad Zhafran | Tianhui Zhang | Yi Zhou | Saif M. Mohammad
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

People worldwide use language in subtle and complex ways to express emotions. Although emotion recognition–an umbrella term for several NLP tasks–impacts various applications within NLP and beyond, most work in this area has focused on high-resource languages. This has led to significant disparities in research efforts and proposed solutions, particularly for under-resourced languages, which often lack high-quality annotated datasets.In this paper, we present BRIGHTER–a collection of multi-labeled, emotion-annotated datasets in 28 different languages and across several domains. BRIGHTER primarily covers low-resource languages from Africa, Asia, Eastern Europe, and Latin America, with instances labeled by fluent speakers. We highlight the challenges related to the data collection and annotation processes, and then report experimental results for monolingual and crosslingual multi-label emotion identification, as well as emotion intensity recognition. We analyse the variability in performance across languages and text domains, both with and without the use of LLMs, and show that the BRIGHTER datasets represent a meaningful step towards addressing the gap in text-based emotion recognition.

pdf bib abs
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia
Samuel Cahyawijaya | Holy Lovenia | Joel Ruben Antony Moniz | Tack Hwa Wong | Mohammad Rifqi Farhansyah | Thant Thiri Maung | Frederikus Hudi | David Anugraha | Muhammad Ravi Shulthan Habibi | Muhammad Reza Qorib | Amit Agarwal | Joseph Marvin Imperial | Hitesh Laxmichand Patel | Vicky Feliren | Bahrul Ilmi Nasution | Manuel Antonio Rufino | Genta Indra Winata | Rian Adam Rajagede | Carlos Rafael Catalan | Mohamed Fazli Mohamed Imam | Priyaranjan Pattnayak | Salsabila Zahirah Pranida | Kevin Pratama | Yeshil Bangera | Adisai Na-Thalang | Patricia Nicole Monderin | Yueqi Song | Christian Simon | Lynnette Hui Xian Ng | Richardy Lobo Sapan | Taki Hasan Rafi | Bin Wang | Supryadi | Kanyakorn Veerakanjana | Piyalitt Ittichaiwong | Matthew Theodore Roque | Karissa Vincentio | Takdanai Kreangphet | Phakphum Artkaew | Kadek Hendrawan Palgunadi | Yanzhi Yu | Rochana Prih Hastuti | William Nixon | Mithil Bangera | Adrian Xuan Wei Lim | Aye Hninn Khine | Hanif Muhammad Zhafran | Teddy Ferdinan | Audra Aurora Izzani | Ayushman Singh | Evan Evan | Jauza Akbar Krito | Michael Anugraha | Fenal Ashokbhai Ilasariya | Haochen Li | John Amadeo Daniswara | Filbert Aurelian Tjiaranata | Eryawan Presma Yulianrifat | Can Udomcharoenchaikit | Fadil Risdian Ansori | Mahardika Krisna Ihsani | Giang Nguyen | Anab Maulana Barik | Dan John Velasco | Rifo Ahmad Genadi | Saptarshi Saha | Chengwei Wei | Isaiah Edri W. Flores | Kenneth Chen Ko Han | Anjela Gail D. Santos | Wan Shen Lim | Kaung Si Phyo | Tim Santos | Meisyarah Dwiastuti | Jiayun Luo | Jan Christian Blaise Cruz | Ming Shan Hee | Ikhlasul Akmal Hanif | M.Alif Al Hakim | Muhammad Rizky Sya’ban | Kun Kerdthaisong | Lester James Validad Miranda | Fajri Koto | Tirana Noor Fatyanosa | Alham Fikri Aji | Jostin Jerico Rosal | Jun Kevin | Robert Wijaya | Onno P. Kampman | Ruochen Zhang | Börje F. Karlsson | Peerat Limkonchotiwat
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Despite Southeast Asia’s (SEA) extraordinary linguistic and cultural diversity, the region remains significantly underrepresented in vision-language (VL) research, resulting in AI models that inadequately capture SEA cultural nuances. To fill this gap, we present SEA-VL, an open-source initiative dedicated to developing culturally relevant high-quality datasets for SEA languages. By involving contributors from SEA countries, SEA-VL ensures better cultural relevance and diversity, fostering greater inclusivity of underrepresented languages and cultural depictions in VL research. Our methodology employed three approaches: community-driven crowdsourcing with SEA contributors, automated image crawling, and synthetic image generation. We evaluated each method’s effectiveness in capturing cultural relevance. We found that image crawling achieves approximately ~85% cultural relevance while being more cost- and time-efficient than crowdsourcing, whereas synthetic image generation failed to accurately reflect SEA cultural nuances and contexts. Collectively, we gathered 1.28 million SEA culturally relevant images, more than 50 times larger than other existing datasets. This work bridges the representation gap in SEA, establishes a foundation for developing culturally aware AI systems for this region, and provides a replicable framework for addressing representation gaps in other underrepresented regions.

Co-authors

Felermino D. M. A. Ali 1

Ilseyar Alimova 1

Fadil Risdian Ansori 1

David Anugraha 1

Michael Anugraha 1

Vladimir Araujo 1

Phakphum Artkaew 1

Nikolay Babakov 1

Yeshil Bangera 1

Mithil Bangera 1

Anab Maulana Barik 1

Meriem Beloucif 1

Ana-Maria Bucur 1

Andiswa Bukula 1

Samuel Cahyawijaya 1

Carlos Rafael Catalan 1

Chiamaka Ijeoma Chukwuneke 1

Alexandra Ciobotaru 1

Jan Christian Blaise Cruz 1

John Amadeo Daniswara 1

Christine De Kock 1

Daryna Dementieva 1

Meisyarah Dwiastuti 1

Mohammad Rifqi Farhansyah 1

Tirana Noor Fatyanosa 1

Vicky Feliren 1

Teddy Ferdinan 1

Charles Henrique Porto Ferreira 1

Isaiah Edri W. Flores 1

Murja Sani Gadanya 1

Robert Geislinger 1

Rifo Ahmad Genadi 1

Muhammad Ravi Shulthan Habibi 1

M.Alif Al Hakim 1

Kenneth Chen Ko Han 1

Ikhlasul Akmal Hanif 1

Rochana Prih Hastuti 1

Ming Shan Hee 1

Oumaima Hourrane 1

Frederikus Hudi 1

Mahardika Krisna Ihsani 1

Fenal Ashokbhai Ilasariya 1

Mohamed Fazli Mohamed Imam 1

Joseph Marvin Imperial 1

Piyalitt Ittichaiwong 1

Audra Aurora Izzani 1

Onno P. Kampman 1

Börje F. Karlsson 1

Kun Kerdthaisong 1

Aye Hninn Khine 1

Takdanai Kreangphet 1

Jauza Akbar Krito 1

Falalu Ibrahim Lawan 1

Adrian Xuan Wei Lim 1

Peerat Limkonchotiwat 1

Rooweither Mabuya 1

Rahmad Mahendra 1

Vukosi Marivate 1

Thant Thiri Maung 1

Lester James Validad Miranda 1

Saif Mohammad 1

Patricia Nicole Monderin 1

Joel Ruben Antony Moniz 1

Shamsuddeen Hassan Muhammad 1

Adisai Na-Thalang 1

Bahrul Ilmi Nasution 1

Lynnette Hui Xian Ng 1

William Nixon 1

Nedjma Ousidhoum 1

Kadek Hendrawan Palgunadi 1

Alexander Panchenko 1

Hitesh Laxmichand Patel 1

Priyaranjan Pattnayak 1

Kaung Si Phyo 1

Salsabila Zahirah Pranida 1

Kevin Pratama 1

Vitaly Protasov 1

Muhammad Reza Qorib 1

Taki Hasan Rafi 1

Rian Adam Rajagede 1

Matthew Theodore Roque 1

Jostin Jerico Rosal 1

Manuel Antonio Rufino 1

Samuel Rutunda 1

Saptarshi Saha 1

Anjela Gail D. Santos 1

Richardy Lobo Sapan 1

Manish Shrivastava 1

Christian Simon 1

Ayushman Singh 1

Nirmal Surange 1

Muhammad Rizky Sya’ban 1

Daniela Teodorescu 1

Filbert Aurelian Tjiaranata 1

Rodrigo Tufiño 1

Can Udomcharoenchaikit 1

Aura Cristina Udrea 1

Kanyakorn Veerakanjana 1

Dan John Velasco 1

Karissa Vincentio 1

Jan Philip Wahle 1

Lilian Diana Awuor Wanzare 1

Robert Wijaya 1

Genta Indra Winata 1

Tack Hwa Wong 1

Florian Valentin Wunderlich 1

Eryawan Presma Yulianrifat 1

Tianhui Zhang 1

Ruochen Zhang 1

Venues

acl2