Jyoti Kumari
2026
PolAR Bears at SemEval-2026 Task 9: Parameter-Efficient Fine-Tuning and Cross-Lingual Augmentation for Multilingual Polarization Detection
Vinay Ulli | Jyoti Kumari
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Vinay Ulli | Jyoti Kumari
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
This paper describes our system for SemEval-2026 Task 9: Detecting Multilingual, Multicul-tural and Multievent Online Polarization. Wefocus on four low-resource Indian languages(Hindi, Bengali, Telugu, and Odia) across threesubtasks: Polarization Detection, Type Classi-fication, and Manifestation Identification. Toaddress data scarcity, we employ cross-lingualdata augmentation using IndicTrans2, expand-ing our dataset fourfold. Our unified architec-ture leverages Qwen3-4B-Instruct optimizedvia QLoRA, training a linear classification headon masked mean-pooled hidden states withonly ∼33M trainable parameters. Our systemachieved highly competitive results in Subtask1, with an average Macro F1 of 0.813 across alllanguages (peaking at 0.8668 for Telugu). Forthe complex multi-label frameworks of Sub-tasks 2 and 3, our results expose a significantpre-training bias within foundational LLMs;while Hindi maintained strong F1 scores of0.7008 and 0.7248, performance dropped con-siderably for the other three languages, high-lighting the ongoing challenges of cross-lingualtransfer for nuanced rhetorical techniques.
TeamV at LT-EDI 2026: Multilingual Hate Speech Span Detection and Counter-Narrative Generation via Few-Shot In-Context Learning
Vinay Babu Ulli | Jyoti Kumari
Proceedings of the Sixth Workshop on Language Technology for Equality, Diversity, Inclusion
Vinay Babu Ulli | Jyoti Kumari
Proceedings of the Sixth Workshop on Language Technology for Equality, Diversity, Inclusion
This paper describes the system developed byTeamV for the LT-EDI 2026 Shared Task onCounter-Narrative Generation on Homophobic Transphobic Comments. The shared taskcomprises two subtasks: (1) Hate Speech SpanDetection in English, Tamil, and Hindi, and (2)Counter-Narrative Generation in English andTamil. Our system leverages the reasoning andmultilingual capabilities of a large proprietarylanguage model (Qwen3-Max) through rigor-ous few-shot in-context learning (ICL) and ro-bust post-processing mechanisms. Our submit-ted system demonstrated state-of-the-art perfor-mance on the official CodaBench leaderboard.In Task 1, our approach achieved 1st Placeacross all three languages, securing macro F1scores of 0.5338 in English, 0.5272 in Tamil,and 0.5478 in Hindi. For Task 2, our generatedcounter-narratives ranked 1st globally in En-glish with an overall average score of 87.47%and 5th in Tamil. We present our promptingmethodology, robust span-matching pipeline,detailed official results, and an analysis of themodel’s performance across diverse languages.
PolyTicsTamil_Alchemists@DravidianLangTech@ACL 2026: An Augmentation-Driven Focal Ensemble Model for Political Sentiment Analysis in Tamil
Jyoti Kumari | Meclin A Francis | Vinay Babu Ulli | Malavika Sreekumar | Joel Johnson
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Jyoti Kumari | Meclin A Francis | Vinay Babu Ulli | Malavika Sreekumar | Joel Johnson
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
This paper describes our system submitted to the DravidianLangTech@ACL 2026 shared task on Political Multiclass Sentiment Analysis of Tamil X (Twitter) Comments. The task requires classifying Tamil political tweets into seven sentiment categories. We address two key challenges, severe class imbalance and semantic overlap between categories, through a three-stage pipeline. First, we balance the training set by augmenting minority classes via back-translation and transformer-based paraphrasing. Second, we fine-tune XLM-RoBERTa-base using a class-weighted Focal Loss (𝛾=2), which directs learning towards hard, ambiguous samples. Third, we train five models under Stratified 5-Fold Cross-Validation and average their softmax outputs at inference time. On the official test set, the system achieves a Macro-F1 of 0.3539. The code is publicly available at: https://github.com/meclin2345/PolyTicsTamil_Alchemists
Hope_Speech_Alchemists@DravidianLangTech 2026: TF-IDF SVM and XLM-RoBERTa with Focal Loss for Hope Speech Detection in Tulu
Joel Johnson | Meclin A Francis | Jyoti Kumari | Malavika Sreekumar | Vinay Babu Ulli
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Joel Johnson | Meclin A Francis | Jyoti Kumari | Malavika Sreekumar | Vinay Babu Ulli
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
This paper describes our system submitted to the shared task on Hope Speech Detection in Tulu at DravidianLangTech@ACL 2026 hope-speech-dravidianlangtech-acl-2026. The task comprises two sub-tasks: coarse-grained classification into four categories Task 1 and fine-grained classification into five categories Task 2. We compare a traditional TF-IDF + LinearSVC baseline against XLM-RoBERTa fine-tuned with minority-class oversampling and Focal Loss. Our experiments reveal an interesting trade-off: while the transformer approach achieves the best validation Macro-F1 of 0.57 on the coarse-grained task, the TF-IDF baseline outperforms it on the smaller fine-grained task, highlighting the data scarcity threshold below which large pre-trained models struggle to generalise. On the official test set, our system achieves a Macro-F1 of 0.55 on Task 1 and 0.40 on Task 2. The code is publicly available at: https://github.com/meclin2345/Hope_Speech_Alchemists
AbuseDetect_Alchemists@DravidianLangTech 2026: A Weighted Transformer Ensemble for Detecting Abusive Tamil Text Targeting Women
Meclin A Francis | Jyoti Kumari | Vinay Babu Ulli | Malavika Sreekumar | Joel Johnson
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Meclin A Francis | Jyoti Kumari | Vinay Babu Ulli | Malavika Sreekumar | Joel Johnson
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
This paper describes our system submitted to the shared task on Abusive Tamil Text Targeting Women on Social Media at DravidianLangTech@ACL 2026. We formulate the problem as a supervised binary classification task, assigning each Tamil social media comment to an Abusive or Non-Abusive category. Our pipeline begins with a tailored preprocessing stage that handles emoji translation, URL removal, and entity normalization. We then independently fine-tune two pre-trained transformer models MuRIL and XLM-RoBERTa on the task data. At inference time, we combine these models through a weighted softmax ensemble, assigning a weight of 0.6 to MuRIL and 0.4 to XLM-RoBERTa. The resulting system achieves a Macro-F1 score of 0.8115 on the test set, outperforming both individual models. The code is publicly available at: https://github.com/meclin2345/AbuseDetect_Alchemists
Team Aurum at MedExACT 2026@ACL: Data Augmentation and Clinical Longformer Fine-Tuning for Medical Decision Extraction
Jyoti Kumari | Vinay Ulli | Anindita Mondal
Proceedings of the BioNLP 2026 (Shared Tasks)
Jyoti Kumari | Vinay Ulli | Anindita Mondal
Proceedings of the BioNLP 2026 (Shared Tasks)
This paper describes the system submitted by team Aurum to the Medical Decision Extraction, Analysis, and Classification Task (MedExACT) at BioNLP 2026. The task requires the extraction and classification of contiguous text spans representing medical decisions from lengthy ICU discharge summaries. To address the dual challenges of long document lengths and severe class imbalance withina limited training set of 350 notes, we propose a two-pronged strategy. First, we employ a tripartite data augmentation pipeline utilizing rule-based entity replacement, LLM-based contextual paraphrasing, and synthetic note generation to expand the training data to over 2,300 notes. Second, we fine-tune a domain-specific Clinical Longformer model equipped with a sliding-window inference mechanism and Focal Loss to handle sequences up to 2,048 tokens while focusing on rare decision categories. Paired with a targeted post-processing module,our system achieved a Final Score of 0.5251, demonstrating high token-level detection (Token F1: 0.6311) and strong stability across patient demographics.
TypeCoT at UZH Shared Task 2026: Reconstructing Argumentative Structure in UN Resolutions using Type-Informed Chain-of-Thought
Chandan Kumar R S | Vinay Babu Ulli | Jyoti Kumari | Vaibhav Singh
Proceedings of the 13th Workshop on Argument Mining and Reasoning
Chandan Kumar R S | Vinay Babu Ulli | Jyoti Kumari | Vaibhav Singh
Proceedings of the 13th Workshop on Argument Mining and Reasoning
United Nations and UNESCO resolutions encode complex collective reasoning through highly structured preambles and operative clauses. Reconstructing this implicit argumentative structure is a challenging natural language processing task. This paper describes our submission to the UZH Shared Task at the ArgMining Workshop 2026. Adhering to the strict constraint of using open-weight models with at most 8B parameters, we propose a highly efficient, modular pipeline built entirely upon the Qwen-2.5-7B-Instruct architecture. To address Subtask 1, we decouple the problem, employing a 4-bit quantized LoRA adapter via the Unsloth framework for paragraph type classification and a type-informed chain-of-thought approach for thematic tagging and relation prediction.
2023
JA-NLP@LT-EDI-2023: Empowering Mental Health Assessment: A RoBERTa-Based Approach for Depression Detection
Jyoti Kumari | Abhinav Kumar
Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion
Jyoti Kumari | Abhinav Kumar
Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion
Depression, a widespread mental health disorder, affects a significant portion of the global population. Timely identification and intervention play a crucial role in ensuring effective treatment and support. Therefore, this research paper proposes a fine-tuned RoBERTa-based model for identifying depression in social media posts. In addition to the proposed model, Sentence-BERT is employed to encode social media posts into vector representations. These encoded vectors are then utilized in eight different popular classical machine learning models. The proposed fine-tuned RoBERTa model achieved a best macro F1-score of 0.55 for the development dataset and a comparable score of 0.41 for the testing dataset. Additionally, combining Sentence-BERT with Naive Bayes (S-BERT + NB) outperformed the fine-tuned RoBERTa model, achieving a slightly higher macro F1-score of 0.42. This demonstrates the effectiveness of the approach in detecting depression from social media posts.