Musa Izzanardi Wijanarko


2025

pdf bib
NusaAksara: A Multimodal and Multilingual Benchmark for Preserving Indonesian Indigenous Scripts
Muhammad Farid Adilazuarda | Musa Izzanardi Wijanarko | Lucky Susanto | Khumaisa Nur’aini | Derry Tanti Wijaya | Alham Fikri Aji
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Indonesia is rich in languages and scripts. However, most NLP progress has been made using romanized text. In this paper, we present NusaAksara, a novel public benchmark for Indonesian languages that includes their original scripts. Our benchmark covers both text and image modalities and encompasses diverse tasks such as image segmentation, OCR, transliteration, translation, and language identification. Our data is constructed by human experts through rigorous steps. NusaAksara covers 8 scripts across 7 languages, including low-resource languages not commonly seen in NLP benchmarks. Although unsupported by Unicode, the Lampung script is included in this dataset. We benchmark our data across several models, from LLMs and VLMs such as GPT-4o, Llama 3.2, and Aya 23 to task-specific systems such as PP-OCR and LangID, and show that most NLP technologies cannot handle Indonesia’s local scripts, with many achieving near-zero performance.

pdf bib
A Multi-Labeled Dataset for Indonesian Discourse: Examining Toxicity, Polarization, and Demographics Information
Lucky Susanto | Musa Izzanardi Wijanarko | Prasetia Anugrah Pratama | Zilu Tang | Fariz Akyas | Traci Hong | Ika Karlina Idris | Alham Fikri Aji | Derry Tanti Wijaya
Findings of the Association for Computational Linguistics: ACL 2025

Online discourse is increasingly trapped in a vicious cycle where polarizing language fuelstoxicity and vice versa. Identity, one of the most divisive issues in modern politics, oftenincreases polarization. Yet, prior NLP research has mostly treated toxicity and polarization asseparate problems. In Indonesia, the world’s third-largest democracy, this dynamic threatens democratic discourse, particularly in online spaces. We argue that polarization and toxicity must be studied in relation to each other. To this end, we present a novel multi-label Indonesian dataset annotated for toxicity, polarization, and annotator demographic information. Benchmarking with BERT-base models and large language models (LLMs) reveals that polarization cues improve toxicity classification and vice versa. Including demographic context further enhances polarization classification performance.

2024

pdf bib
Monitoring Hate Speech in Indonesia: An NLP-based Classification of Social Media Texts
Musa Izzanardi Wijanarko | Lucky Susanto | Prasetia Anugrah Pratama | Ika Karlina Idris | Traci Hong | Derry Tanti Wijaya
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Online hate speech propagation is a complex issue, deeply influenced by both the perpetrator and the target’s cultural, historical, and societal contexts. Consequently, developing a universally robust hate speech classifier for diverse social media texts remains a challenging and unsolved task. The lack of mechanisms to track the spread and severity of hate speech further complicates the formulation of effective solutions. In response to this, to monitor hate speech in Indonesia during the recent 2024 presidential election, we have employed advanced Natural Language Processing (NLP) technologies to create an improved hate speech classifier tailored for a narrower subset of texts; specifically, texts that target vulnerable groups that have historically been the targets of hate speech in Indonesia. Our focus is on texts that mention these six vulnerable minority groups in Indonesia: Shia, Ahmadiyyah, Christians, LGBTQ+, Indonesian Chinese, and people with disabilities, as well as one additional group of interest: Jews. The insights gained from our dashboard have assisted stakeholders in devising more effective strategies to counteract hate speech. Notably, our dashboard has persuaded the General Election Supervisory Body in Indonesia (BAWASLU) to collaborate with our institution and the Alliance of Independent Journalists (AJI) to monitor social media hate speech in vulnerable areas in the country known for hate speech dissemination or hate-related violence in the upcoming Indonesian regional elections. This dashboard is available online at https://aji.or.id/hate-speech-monitoring.