Reshma Sheik

2025

pdf bib abs
Hierarchical Long-Document Summarization using LED for Legal Judgments
Reshma Sheik | Noah John Puthayathu | Fathima Firose A | Jonathan Paul
Proceedings of the 1st Workshop on NLP for Empowering Justice (JUST-NLP 2025)

This paper describes our system for the L-SUMM shared task on legal document summarization. Our approach is built on the Longformer Encoder-Decoder (LED) model, which we augment with a multi-level summarization strategy tailored for legal documents that are substantially longer than typical transformer input limits. The system achieved competitive performance on the legal judgment summarization task through optimized training strategies, including gradient accumulation, Adafactor optimization, and hyperparameter tuning. Our findings indicate that combining hierarchical processing with strategically assigned global attention enables more reliable summarization of lengthy legal texts.

pdf bib abs
A Hybrid Quantum-Classical Fusion for Deep Semantic Paraphrase Detection
Devanarayanan K | Fayas S Mohamad | Dheeraj V Mohan | Reshma Sheik
Proceedings of the QuantumNLP{:} Integrating Quantum Computing with Natural Language Processing

Paraphrase Detection is a core task in natural language processing (NLP) that aims to determine whether two sentences convey equivalent meanings. This work proposes a hybrid quantum–classical framework that integrates Sentence-BERT embeddings, simulated quantum feature encoding, and classical machine learning models to enhance semantic similarity detection. Initially, sentence pairs are embedded using Sentence-BERT and standardized through feature scaling. These representations are then transformed via rotation-based quantum circuits to capture higher-order feature interactions and non-linear dependencies. The resulting hybrid feature space, combining classical and quantum-inspired components, is evaluated using LightGBM and deep neural network classifiers. Experimental results show that the hybrid model incorporating quantum-inspired features achieved superior classification performance, yielding a 10% improvement in overall accuracy outperforming standalone deep learning baselines. These findings demonstrate that quantum–classical fusion enhances semantic feature extraction and significantly improves paraphrase detection performance.

2024

pdf bib
Mitigating Gender Bias in Large Language Models: An Evaluation Using Chain-of-Thought Prompting
Arati Mohapatra | Kavimalar Subbiah | Reshma Sheik | S Jaya Nirmala
Proceedings of the 38th Pacific Asia Conference on Language, Information and Computation

2023

pdf bib abs
Mitigating Abusive Comment Detection in Tamil Text: A Data Augmentation Approach with Transformer Model
Reshma Sheik | Raghavan Balanathan | Jaya Nirmala S.
Proceedings of the 20th International Conference on Natural Language Processing (ICON)

With the increasing number of users on social media platforms, the detection and categorization of abusive comments have become crucial, necessitating effective strategies to mitigate their impact on online discussions. However, the intricate and diverse nature of lowresource Indic languages presents a challenge in developing reliable detection methodologies. This research focuses on the task of classifying YouTube comments written in Tamil language into various categories. To achieve this, our research conducted experiments utilizing various multi-lingual transformer-based models along with data augmentation approaches involving back translation approaches and other pre-processing techniques. Our work provides valuable insights into the effectiveness of various preprocessing methods for this classification task. Our experiments showed that the Multilingual Representations for Indian Languages (MURIL) transformer model, coupled with round-trip translation and lexical replacement, yielded the most promising results, showcasing a significant improvement of over 15 units in macro F1-score compared to existing baselines. This contribution adds to the ongoing research to mitigate the adverse impact of abusive content on online platforms, emphasizing the utilization of diverse preprocessing strategies and state-of-the-art language models.

2022

pdf bib abs
Efficient Deep Learning-based Sentence Boundary Detection in Legal Text
Reshma Sheik | Gokul T | S Nirmala
Proceedings of the Natural Legal Language Processing Workshop 2022

A key component of the Natural Language Processing (NLP) pipeline is Sentence Boundary Detection (SBD). Erroneous SBD could affect other processing steps and reduce performance. A few criteria based on punctuation and capitalization are necessary to identify sentence borders in well-defined corpora. However, due to several grammatical ambiguities, the complex structure of legal data poses difficulties for SBD. In this paper, we have trained a neural network framework for identifying the end of the sentence in legal text. We used several state-of-the-art deep learning models, analyzed their performance, and identified that Convolutional Neural Network(CNN) outperformed other deep learning frameworks. We compared the results with rule-based, statistical, and transformer-based frameworks. The best neural network model outscored the popular rule-based framework with an improvement of 8% in the F1 score. Although domain-specific statistical models have slightly improved performance, the trained CNN is 80 times faster in run-time and doesn’t require much feature engineering. Furthermore, after extensive pretraining, the transformer models fall short in overall performance compared to the best deep learning model.