Amrita Singh

2026

Evaluating Customized vs. Generalist Transformer-based Models for Legal Contract Classification
Amrita Singh | H. Suhan Karaca | Aditya Joshi | Hye-young Paik | Jiaojiao Jiang
Proceedings of the Second Workshop on Customizable NLP: Progress and Challenges in Customizing NLP for a Domain, Application, Group, or Individual (CustomNLP4U)

Despite advances in legal NLP, no comprehensive evaluation of Transformer-based models customized for legal tasks (referred to as ’legal-specific’ models in this paper) exists for contract classification tasks. To address this gap, we present an evaluation of 13 legal-specific transformer-based models on 3 English-language contract classification tasks and compare them with 9 generalist models. The results show that legal-specific models consistently outperform generalist models, especially on tasks requiring nuanced legal understanding. They also help reduce misclassification of rare classes in imbalanced datasets. Legal-BERT and Contracts-BERT establish new SOTAs on two of the three tasks, despite having 69% fewer parameters than the best-performing generalist models. We also identify CaseLaw-BERT and LexLM as strong additional baselines for contract classification. Our results highlight the shortcomings of generalist models, emphasizing the need for domain-specific customization, particularly in the context of legal applications.

pdf bib abs

MedArabs at AbjadMed: Arabic Medical Text Classification via Data- and Algorithm-Level Fusion
Amrita Singh
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script

In this work, we address the challenges of Arabic medical text classification, focusing on class imbalance and the complexity of the language’s morphology. We propose a multiclass classification pipeline based on Data- and Algorithm-Level fusion, which integrates the optimal Back Translation technique for data augmentation with the Class Balanced (CB) loss function to enhance performance. The domain-specific AraBERT model is fine-tuned using this approach, achieving competitive results. On the official test set of the AbjadMed task, our pipeline achieves a Macro-F1 score of 0.4219, and it achieves 0.4068 on the development set.

2024

pdf bib abs

Refining App Reviews: Dataset, Methodology, and Evaluation
Amrita Singh | Chirag Jain | Mohit Chaudhary | Preethu Rose Anish
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track

With the growing number of mobile users, app development has become increasingly lucrative. Reviews on platforms such as Google Play and Apple App Store provide valuable insights to developers, highlighting bugs, suggesting new features, and offering feedback. However, many reviews contain typos, spelling errors, grammar mistakes, and complex sentences, hindering efficient interpretation and slowing down app improvement processes. To tackle this, we introduce RARE (Repository for App review REfinement), a benchmark dataset of 10,000 annotated pairs of original and refined reviews from 10 mobile applications. These reviews were collaboratively refined by humans and large language models (LLMs). We also conducted an evaluation of eight state-of-the-art LLMs for automated review refinement. The top-performing model (Flan-T5) was further used to refine an additional 10,000 reviews, contributing to RARE as a silver corpus.

Co-authors

H. Suhan Karaca 1

Hye-young Paik 1

Venues

Fix author