Improving aggressiveness detection using a data augmentation technique based on a Diffusion Language Model

Antonio Reyes-Ramírez, Mario Aragón, Fernando Sánchez-Vega, Adrian López-Monroy


Abstract
Cyberbullying has grown in recent years, largely attributed to the proliferation of social media users. This phenomenon manifests in various forms, such as hate speech and offensive language, increasing the necessity of effective detection models to tackle this problem. Most approaches focus on supervised algorithms, which have an important drawback—they heavily depend on the availability of ample training data. This paper attempts to tackle this insufficient data problem using data augmentation (DA) techniques. Concretely, we propose a novel data augmentation technique based on a Diffusion Language Model (DLA). We compare our proposed method against well-known DA techniques, such as contextual augmentation and Easy Data Augmentation (EDA). Our findings reveal a slight but promising improvement, leading to more robust results with very low variance. Additionally, we provide a comprehensive qualitative analysis using classification errors, and complementary analysis, shedding light on the nuances of our approach.
Anthology ID:
2024.woah-1.13
Volume:
Proceedings of the 8th Workshop on Online Abuse and Harms (WOAH 2024)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Yi-Ling Chung, Zeerak Talat, Debora Nozza, Flor Miriam Plaza-del-Arco, Paul Röttger, Aida Mostafazadeh Davani, Agostina Calabrese
Venues:
WOAH | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
171–177
Language:
URL:
https://aclanthology.org/2024.woah-1.13
DOI:
Bibkey:
Cite (ACL):
Antonio Reyes-Ramírez, Mario Aragón, Fernando Sánchez-Vega, and Adrian López-Monroy. 2024. Improving aggressiveness detection using a data augmentation technique based on a Diffusion Language Model. In Proceedings of the 8th Workshop on Online Abuse and Harms (WOAH 2024), pages 171–177, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Improving aggressiveness detection using a data augmentation technique based on a Diffusion Language Model (Reyes-Ramírez et al., WOAH-WS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/jeptaln-2024-ingestion/2024.woah-1.13.pdf