Fariz Akyas
2025
A Multi-Labeled Dataset for Indonesian Discourse: Examining Toxicity, Polarization, and Demographics Information
Lucky Susanto
|
Musa Izzanardi Wijanarko
|
Prasetia Anugrah Pratama
|
Zilu Tang
|
Fariz Akyas
|
Traci Hong
|
Ika Karlina Idris
|
Alham Fikri Aji
|
Derry Tanti Wijaya
Findings of the Association for Computational Linguistics: ACL 2025
Online discourse is increasingly trapped in a vicious cycle where polarizing language fuelstoxicity and vice versa. Identity, one of the most divisive issues in modern politics, oftenincreases polarization. Yet, prior NLP research has mostly treated toxicity and polarization asseparate problems. In Indonesia, the world’s third-largest democracy, this dynamic threatens democratic discourse, particularly in online spaces. We argue that polarization and toxicity must be studied in relation to each other. To this end, we present a novel multi-label Indonesian dataset annotated for toxicity, polarization, and annotator demographic information. Benchmarking with BERT-base models and large language models (LLMs) reveals that polarization cues improve toxicity classification and vice versa. Including demographic context further enhances polarization classification performance.
Search
Fix author
Co-authors
- Alham Fikri Aji 1
- Traci Hong 1
- Ika Karlina Idris 1
- Prasetia Anugrah Pratama 1
- Lucky Susanto 1
- show all...