Danish Mohammed

2026

FLAICOL: Flip-Point-Led Augmentation for Imbalanced Code-Mixed Offensive Language Detection
Danish Mohammed | Vidhya Kamakshi
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

Hate speech detection in low-resource, code-mixed languages is a challenging task as people often switch between scripts and languages in a single post. Code-Mixed scripts can take the form of explicit slurs, subtle insults, or fragmented abuse, and is often hidden by spelling variants and Romanized script. These datasets are also subjected to class imbalance with hate speech being a minority class of interest. To mitigate the imbalance, targeted data augmentation of minority class samples can help learn better representations to aid hate speech detection despite the naturally expected imbalance. We propose FLAICOL, a flip-point method which identifies the minimal embedding perturbation that moves an input across the decision boundary, map it back to discrete text, and retrain on those focused examples. Empirical results show that these interpretable augmentations strengthen Transformer classifiers on low-resource, code-mixed low resource hate datasets (Experiments were conducted on the Tamil-English, Malayalam-English, and Kannada-English splits in the Dravidian CodeMix Benchmark).

Co-authors

Vidhya Kamakshi 1

Venues

Fix author