Joel Johnson

2026

PolyTicsTamil_Alchemists@DravidianLangTech@ACL 2026: An Augmentation-Driven Focal Ensemble Model for Political Sentiment Analysis in Tamil
Jyoti Kumari | Meclin A Francis | Vinay Babu Ulli | Malavika Sreekumar | Joel Johnson
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

This paper describes our system submitted to the DravidianLangTech@ACL 2026 shared task on Political Multiclass Sentiment Analysis of Tamil X (Twitter) Comments. The task requires classifying Tamil political tweets into seven sentiment categories. We address two key challenges, severe class imbalance and semantic overlap between categories, through a three-stage pipeline. First, we balance the training set by augmenting minority classes via back-translation and transformer-based paraphrasing. Second, we fine-tune XLM-RoBERTa-base using a class-weighted Focal Loss (𝛾=2), which directs learning towards hard, ambiguous samples. Third, we train five models under Stratified 5-Fold Cross-Validation and average their softmax outputs at inference time. On the official test set, the system achieves a Macro-F1 of 0.3539. The code is publicly available at: https://github.com/meclin2345/PolyTicsTamil_Alchemists

pdf bib abs

Hope_Speech_Alchemists@DravidianLangTech 2026: TF-IDF SVM and XLM-RoBERTa with Focal Loss for Hope Speech Detection in Tulu
Joel Johnson | Meclin A Francis | Jyoti Kumari | Malavika Sreekumar | Vinay Babu Ulli
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

This paper describes our system submitted to the shared task on Hope Speech Detection in Tulu at DravidianLangTech@ACL 2026 hope-speech-dravidianlangtech-acl-2026. The task comprises two sub-tasks: coarse-grained classification into four categories Task 1 and fine-grained classification into five categories Task 2. We compare a traditional TF-IDF + LinearSVC baseline against XLM-RoBERTa fine-tuned with minority-class oversampling and Focal Loss. Our experiments reveal an interesting trade-off: while the transformer approach achieves the best validation Macro-F1 of 0.57 on the coarse-grained task, the TF-IDF baseline outperforms it on the smaller fine-grained task, highlighting the data scarcity threshold below which large pre-trained models struggle to generalise. On the official test set, our system achieves a Macro-F1 of 0.55 on Task 1 and 0.40 on Task 2. The code is publicly available at: https://github.com/meclin2345/Hope_Speech_Alchemists

pdf bib abs

AbuseDetect_Alchemists@DravidianLangTech 2026: A Weighted Transformer Ensemble for Detecting Abusive Tamil Text Targeting Women
Meclin A Francis | Jyoti Kumari | Vinay Babu Ulli | Malavika Sreekumar | Joel Johnson
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

This paper describes our system submitted to the shared task on Abusive Tamil Text Targeting Women on Social Media at DravidianLangTech@ACL 2026. We formulate the problem as a supervised binary classification task, assigning each Tamil social media comment to an Abusive or Non-Abusive category. Our pipeline begins with a tailored preprocessing stage that handles emoji translation, URL removal, and entity normalization. We then independently fine-tune two pre-trained transformer models MuRIL and XLM-RoBERTa on the task data. At inference time, we combine these models through a weighted softmax ensemble, assigning a weight of 0.6 to MuRIL and 0.4 to XLM-RoBERTa. The resulting system achieves a Macro-F1 score of 0.8115 on the test set, outperforming both individual models. The code is publicly available at: https://github.com/meclin2345/AbuseDetect_Alchemists

Co-authors

Venues

Fix author