Arpita Mallik

2026

CUET-823 at SemEval-2026 Task 9: LoRA-Based Instruction Fine-Tuning of LLMs vs. Transformer Models for Bengali Polarization Detection
Arpita Mallik | Ratnajit Dhar
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

The rapid growth of social media has gone hand in hand with a sharp increase in heated public discussions, where debates about elections, conflicts, protests, and identity often turn into divisive and polarized rhetoric. In this paper, we present our system for SemEval 2026 Task 9 – Subtask 1: Multilingual Text Classification Challenge-Polarization Detection, focusing specifically on the Bengali language. The task is a binary classification problem aimed at determining whether a social media post exhibits attitude polarization, such as intolerance, dehumanization, deindividuation, vilification, or stereotyping toward others’ opinions, identities, or beliefs. Among 49 participating teams, our approach ranked 2nd, achieving a macro-F1 score of 0.8582. We experimented with both transformer-based models and large language models (LLMs), and observed that LoRA-based instruction fine-tuned LLM-based approaches delivered the strongest performance in detecting nuanced and context-dependent polarization in Bengali text.

pdf bib abs

Still Loading@DravidianLangTech 2026: Telugu Prompt-Style Recovery using Multilingual Transformers
Samonwita Sarker | Isnat Mehrin Sami | Priyontee Mojumder | Arpita Mallik | Hasan Murad
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

This paper describes the system that our Still-Loading team designed to run the Telugu Prompt-Style Recovery shared task at DravidianLangTech@ACL 2026. The purpose of the given task is categorizing Telugu transcript passages as belonging to one of 9 communicative styles: Formal, Informal, Optimistic, Pessimistic, Humorous, Serious, Inspiring, Authoritative, and Persuasive. We compared several multilingual Transformer-based models, i.e. MuRIL, XLM-RoBERTa-Large, mBERT, and IndicBERTv2. We chose a "Turbo Sandwich" preprocessing strategy which helps to give more emphasis to lexical deltas, in addition to Focal Loss. Our system based on the MuRIL was rated at the 7th place in the official leaderboard with a Macro-F1 rating of 0.1703. The source code to reproduce our experiments is publicly available on Still-Loading-Prompt-Recovery-for-LLM-in-Telugu (https://github.com/Priyontee1713/Still-Loading-Prompt-Recovery-for-LLM-in-Telugu).

pdf bib abs

Cuet Yet Another Baseline@DravidianLangTech 2026: Shared Task on Prompt Recovery for LLM in Telugu
Rotna Dipika Debnath | Shahrin Afroz Hoque Ruhi | Ayesha Labiba | Arpita Mallik | Hasan Murad
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

Prompt recovery in large language models (LLMs) is the task of inferring the communicative intent and stylistic framing of the original instruction from model-generated output. This task is especially challenging for low-resource Dravidian languages such as Telugu, where agglutinative morphology, register variation, and scarce annotated data complicate stylistic modelling. In this paper, we present our system for the Shared Task on Prompt Recovery for LLM in Telugu at DravidianLangTech @ ACL 2026, which aims to classify Telugu transcript excerpts into nine communicative style categories: Formal, Informal, Optimistic, Pessimistic, Humorous, Serious, Inspiring, Authoritative, and Persuasive.We have implemented a transformer-based approach using ai4bharat/IndicBERTv2-MLM-only, MuRIL-base and Telugu-BERT for Telugu communicative style classification. Our system fine-tunes the pretrained Indic language training samples to capture stylistic patterns in Telugu transcripts. Our approach achieved a macro F1 score of 0.2993 on the evaluation set, demonstrating the potential of Indic-focused pretrained models for stylistic analysis in low-resource language settings.Controlled ablations reveal that label smoothing benefits stronger Indic backbones but degrades weaker ones, and that surface linguistic feature augmentation does not complement rich contextual representations on small datasets.

2025

pdf bib abs

CUET-823@DravidianLangTech 2025: Shared Task on Multimodal Misogyny Meme Detection in Tamil Language
Arpita Mallik | Ratnajit Dhar | Udoy Das | Momtazul Arefin Labib | Samia Rahman | Hasan Murad
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

Misogynous content on social media, especially in memes, present challenges due to the complex reciprocation of text and images that carry offensive messages. This difficulty mostly arises from the lack of direct alignment between modalities and biases in large-scale visio-linguistic models. In this paper, we present our system for the Shared Task on Misogyny Meme Detection - DravidianLangTech@NAACL 2025. We have implemented various unimodal models, such as mBERT and IndicBERT for text data, and ViT, ResNet, and EfficientNet for image data. Moreover, we have tried combining these models and finally adopted a multimodal approach that combined mBERT for text and EfficientNet for image features, both fine-tuned to better interpret subtle language and detailed visuals. The fused features are processed through a dense neural network for classification. Our approach achieved an F1 score of 0.78120, securing 4th place and demonstrating the potential of transformer-based architectures and state-of-the-art CNNs for this task.

pdf bib

CUET-823 at MAHED 2025 Shared Task: Large Language Model-Based Framework for Emotion, Offensive, and Hate Detection in Arabic
Ratnajit Dhar | Arpita Mallik
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks