2025
pdf
bib
abs
PoseStitch-SLT: Linguistically Inspired Pose-Stitching for End-to-End Sign Language Translation
Abhinav Joshi
|
Vaibhav Sharma
|
Sanjeet Singh
|
Ashutosh Modi
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Sign language translation remains a challenging task due to the scarcity of large-scale, sentence-aligned datasets. Prior arts have focused on various feature extraction and architectural changes to support neural machine translation for sign languages. We propose PoseStitch-SLT, a novel pre-training scheme that is inspired by linguistic-templates-based sentence generation technique. With translation comparison on two sign language datasets, How2Sign and iSign, we show that a simple transformer-based encoder-decoder architecture outperforms the prior art when considering template-generated sentence pairs in training. We achieve BLEU-4 score improvements from 1.97 to 4.56 on How2Sign and from 0.55 to 3.43 on iSign, surpassing prior state-of-the-art methods for pose-based gloss-free translation. The results demonstrate the effectiveness of template-driven synthetic supervision in low-resource sign language settings.
pdf
bib
abs
Calibration Across Layers: Understanding Calibration Evolution in LLMs
Abhinav Joshi
|
Areeb Ahmad
|
Ashutosh Modi
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large Language Models (LLMs) have demonstrated inherent calibration capabilities, where predicted probabilities align well with correctness, despite prior findings that deep neural networks are often overconfident. Recent studies have linked this behavior to specific components in the final layer, such as entropy neurons and the unembedding matrix’s null space. In this work, we provide a complementary perspective by investigating how calibration evolves throughout the network’s depth. Analyzing multiple open-weight models on the MMLU benchmark, we uncover a distinct confidence correction phase in the upper/later layers, where model confidence is actively recalibrated after decision certainty has been reached. Furthermore, we identify a low-dimensional calibration direction in the residual stream whose perturbation significantly improves calibration metrics (ECE and MCE) without harming accuracy. Our findings suggest that calibration is a distributed phenomenon, shaped throughout the network’s forward pass, not just in its final projection, providing new insights into how confidence-regulating mechanisms operate within LLMs.
pdf
bib
Proceedings of the 1st Workshop on NLP for Empowering Justice (JUST-NLP 2025)
Ashutosh Modi
|
Saptarshi Ghosh
|
Asif Ekbal
|
Pawan Goyal
|
Sarika Jain
|
Abhinav Joshi
|
Shivani Mishra
|
Debtanu Datta
|
Shounak Paul
|
Kshetrimayum Boynao Singh
|
Sandeep Kumar
Proceedings of the 1st Workshop on NLP for Empowering Justice (JUST-NLP 2025)
pdf
bib
abs
Overview of the 1st Workshop on NLP for Empowering Justice
Ashutosh Modi
|
Saptarshi Ghosh
|
Asif Ekbal
|
Pawan Goyal
|
Sarika Jain
|
Abhinav Joshi
|
Shivani Mishra
|
Debtanu Datta
|
Shounak Paul
|
Kshetrimayum Boynao Singh
|
Sandeep Kumar
Proceedings of the 1st Workshop on NLP for Empowering Justice (JUST-NLP 2025)
The first iteration of the JUST-NLP: Workshop on NLP for Empowering Justice was organized to accelerate research in Natural Language Processing for legal text processing. The inaugural edition, JUST-NLP 2025, was held as a hybrid event at IJCNLP-AACL 2025 on December 24 at IIT Bombay. The program featured a research track, four invited talks, and two shared tasks: (1) L-SUMM, an abstractive summarization task for Indian legal judgments, and (2) L-MT, a legal machine translation task between English and Hindi. The workshop received strong interest from the community, with 29 submissions, of which 21 were accepted. Among the accepted papers, 5 were regular research-track papers published in the proceedings, and 2 were accepted as non-archival presentations. For the shared tasks, 9 papers were accepted for L-SUMM, and 5 papers were accepted for L-MT, for publication in the proceedings. The workshop focused on a broad set of Legal NLP challenges, including information extraction, retrieval, multilingual processing, legal reasoning, and applications of large language models. Overall, JUST-NLP 2025 aimed to bring together AI researchers and legal practitioners to develop scalable, domain-aware NLP methods that can support legal workflows and contribute toward more efficient and equitable justice systems.
pdf
bib
abs
Findings of the JUST-NLP 2025 Shared Task on Summarization of Indian Court Judgments
Debtanu Datta
|
Shounak Paul
|
Kshetrimayum Boynao Singh
|
Sandeep Kumar
|
Abhinav Joshi
|
Shivani Mishra
|
Sarika Jain
|
Asif Ekbal
|
Pawan Goyal
|
Ashutosh Modi
|
Saptarshi Ghosh
Proceedings of the 1st Workshop on NLP for Empowering Justice (JUST-NLP 2025)
This paper presents an overview of the Shared Task on Summarization of Indian Court Judgments (L-SUMM), hosted by the JUST-NLP 2025 Workshop at IJCNLP-AACL 2025. This task aims to increase research interest in automatic summarization techniques for lengthy and intricate legal documents from the Indian judiciary. It particularly addresses court judgments that contain dense legal reasoning and semantic roles that must be preserved in summaries. As part of this shared task, we introduce the Indian Legal Summarization (L-SUMM) dataset, comprising 1,800 Indian court judgments paired with expert-written abstractive summaries, both in English. Therefore, the task focuses on generating high-quality abstractive summaries of court judgments in English. A total of 9 teams participated in this task, exploring a diverse range of methodologies, including transformer-based models, extractive-abstractive hybrids, graph-based ranking approaches, long-context LLMs, and rhetorical-role-based techniques. This paper describes the task setup, dataset, evaluation framework, and our findings. We report the results and highlight key trends across participant approaches, including the effectiveness of hybrid pipelines and challenges in handling extreme sequence lengths.
pdf
bib
abs
Findings of the JUST-NLP 2025 Shared Task on English-to-Hindi Legal Machine Translation
Kshetrimayum Boynao Singh
|
Sandeep Kumar
|
Debtanu Datta
|
Abhinav Joshi
|
Shivani Mishra
|
Shounak Paul
|
Pawan Goyal
|
Sarika Jain
|
Saptarshi Ghosh
|
Ashutosh Modi
|
Asif Ekbal
Proceedings of the 1st Workshop on NLP for Empowering Justice (JUST-NLP 2025)
This paper provides an overview of the Shared Task on Legal Machine Translation (L-MT), organized as part of the JUST-NLP 2025 Workshop at IJCNLP-AACL 2025, aimed at improving the translation of legal texts, a domain where precision, structural faithfulness, and terminology preservation are essential. The training set comprises 50,000 sentences, with 5,000 sentences each for the validation and test sets. The submissions employed strategies such as: domain-adaptive fine-tuning of multilingual models, QLoRA-based parameter-efficient adaptation, curriculum-guided supervised training, reinforcement learning with verifiable MT metrics, and from-scratch Transformer training. The systems are evaluated based on BLEU, METEOR, TER, chrF++, BERTScore, and COMET metrics. We also combine the scores of these metrics to give an average score (AutoRank). The top-performing system is based on a fine-tuned distilled NLLB-200 model and achieved the highest AutoRank score of 72.1. Domain adaptation consistently yielded substantial improvements over baseline models, and precision-focused rewards proved especially effective for the legal MT. The findings also highlight that large multilingual Transformers can deliver accurate and reliable English-to-Hindi legal translations when carefully fine-tuned on legal data, advancing the broader goal of improving access to justice in multilingual settings.
pdf
bib
Towards Quantifying Commonsense Reasoning with Mechanistic Insights
Abhinav Joshi
|
Areeb Ahmad
|
Divyaksh Shukla
|
Ashutosh Modi
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
pdf
bib
Proceedings of the Workshop on Sign Language Processing (WSLP)
Mohammed Hasanuzzaman
|
Facundo Manuel Quiroga
|
Ashutosh Modi
|
Sabyasachi Kamila
|
Keren Artiaga
|
Abhinav Joshi
|
Sanjeet Singh
Proceedings of the Workshop on Sign Language Processing (WSLP)
pdf
bib
abs
Overview of the First Workshop on Sign Language Processing (WSLP 2025)
Sanjeet Singh
|
Abhinav Joshi
|
Keren Artiaga
|
Mohammed Hasanuzzaman
|
Facundo Manuel Quiroga
|
Sabyasachi Kamila
|
Ashutosh Modi
Proceedings of the Workshop on Sign Language Processing (WSLP)
We organized the First Workshop on Sign Language Processing (WSLP 2025), co-located with IJCNLP–AACL 2025 at IIT Bombay, to bring together researchers, linguists, and members of the Deaf community and accelerate computational work on under-resourced sign languages.The workshop accepted ten papers—including two official shared-task submissions—that introduced new large-scale resources (a continuous ISL fingerspelling corpus, cross-lingual HamNoSys corpora), advanced multilingual and motion-aware translation models, explored LLM-based augmentation and glossing strategies, and presented lightweight deployable systems for regional languages such as Odia.We ran a three-track shared task on Indian Sign Language that attracted over sixty registered teams and established the first public leaderboards for sentence-level ISL-to-English translation, isolated word recognition, and word-presence prediction.By centring geographic, linguistic, and organiser diversity, releasing open datasets and benchmarks, and explicitly addressing linguistic challenges unique to visual–spatial languages, we significantly broadened the scope of sign-language processing beyond traditionally dominant European and East-Asian datasets, laying a robust foundation for inclusive, equitable, and deployable sign-language AI in the Global South.
2024
pdf
bib
abs
IL-TUR: Benchmark for Indian Legal Text Understanding and Reasoning
Abhinav Joshi
|
Shounak Paul
|
Akshat Sharma
|
Pawan Goyal
|
Saptarshi Ghosh
|
Ashutosh Modi
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Legal systems worldwide are inundated with exponential growth in cases and documents. There is an imminent need to develop NLP and ML techniques for automatically processing and understanding legal documents to streamline the legal system. However, evaluating and comparing various NLP models designed specifically for the legal domain is challenging. This paper addresses this challenge by proposing : Benchmark for Indian Legal Text Understanding and Reasoning. contains monolingual (English, Hindi) and multi-lingual (9 Indian languages) domain-specific tasks that address different aspects of the legal system from the point of view of understanding and reasoning over Indian legal documents. We present baseline models (including LLM-based) for each task, outlining the gap between models and the ground truth. To foster further research in the legal domain, we create a leaderboard (available at: https://exploration-lab.github.io/IL-TUR/ ) where the research community can upload and compare legal text understanding systems.
pdf
bib
abs
CheckersGPT: Learning World Models through Language Modeling
Abhinav Joshi
|
Vaibhav Sharma
|
Ashutosh Modi
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
Although Large Language Models (LLMs) have been trained using just the next token prediction objective, these have shown impressive performance on various tasks. Consequently, it has attracted research interests in this regard. While one line of work in the past has suggested that LLMs learn surface-level statistics from the dataset, another line of work emphasizes that the learned representations are effective for simulating the underlying world model, considering the causal relationship for the next token prediction. This phenomenon is often referred to as the emergence of a world model in sequence prediction tasks. Recent work has demonstrated this phenomenon in a simulated setting of board games like Othello and Chess. In this paper, we analyze the game of Checkers to find out the emergence of a world model in a language model. By training a GPT-style autoregressive language model using only the next character prediction objective, we find that the model does show a hint of learning a world model representation of the board positions. We perform our analysis on two datasets: 1) synthetic dataset, which comes from the checkers game tree, and 2) human gameplay dataset. With multiple models trained with different layer sizes, we find that increasing the parameter size does help learn better world model representation decoded by linear probes.
pdf
bib
abs
iSign: A Benchmark for Indian Sign Language Processing
Abhinav Joshi
|
Romit Mohanty
|
Mounika Kanakanti
|
Andesha Mangla
|
Sudeep Choudhary
|
Monali Barbate
|
Ashutosh Modi
Findings of the Association for Computational Linguistics: ACL 2024
Indian Sign Language has limited resources for developing machine learning and data-driven approaches for automated language processing. Though text/audio-based language processing techniques have shown colossal research interest and tremendous improvements in the last few years, Sign Languages still need to catch up due to the need for more resources. To bridge this gap, in this work, we propose iSign: a benchmark for Indian Sign Language (ISL) Processing. We make three primary contributions to this work. First, we release one of the largest ISL-English datasets with more than video-sentence/phrase pairs. To the best of our knowledge, it is the largest sign language dataset available for ISL. Second, we propose multiple NLP-specific tasks (including SignVideo2Text, SignPose2Text, Text2Pose, Word Prediction, and Sign Semantics) and benchmark them with the baseline models for easier access to the research community. Third, we provide detailed insights into the proposed benchmarks with a few linguistic insights into the working of ISL. We streamline the evaluation of Sign Language processing, addressing the gaps in the NLP research community for Sign Languages. We release the dataset, tasks and models via the following website: https://exploration-lab.github.io/iSign/
pdf
bib
abs
Towards Robust Evaluation of Unlearning in LLMs via Data Transformations
Abhinav Joshi
|
Shaswati Saha
|
Divyaksh Shukla
|
Sriram Vema
|
Harsh Jhamtani
|
Manas Gaur
|
Ashutosh Modi
Findings of the Association for Computational Linguistics: EMNLP 2024
Large Language Models (LLMs) have shown to be a great success in a wide range of applications ranging from regular NLP-based use cases to AI agents. LLMs have been trained on a vast corpus of texts from various sources; despite the best efforts during the data pre-processing stage while training the LLMs, they may pick some undesirable information such as personally identifiable information (PII). Consequently, in recent times research in the area of Machine Unlearning (MUL) has become active, the main idea is to force LLMs to forget (unlearn) certain information (e.g., PII) without suffering from performance loss on regular tasks. In this work, we examine the robustness of the existing MUL techniques for their ability to enable leakage-proof forgetting in LLMs. In particular, we examine the effect of data transformation on forgetting, i.e., is an unlearned LLM able to recall forgotten information if there is a change in the format of the input? Our findings on the TOFU dataset highlight the necessity of using diverse data formats to quantify unlearning in LLMs more reliably.
2023
pdf
bib
abs
U-CREAT: Unsupervised Case Retrieval using Events extrAcTion
Abhinav Joshi
|
Akshat Sharma
|
Sai Kiran Tanikella
|
Ashutosh Modi
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The task of Prior Case Retrieval (PCR) in the legal domain is about automatically citing relevant (based on facts and precedence) prior legal cases in a given query case. To further promote research in PCR, in this paper, we propose a new large benchmark (in English) for the PCR task: IL-PCR (Indian Legal Prior Case Retrieval) corpus. Given the complex nature of case relevance and the long size of legal documents, BM25 remains a strong baseline for ranking the cited prior documents. In this work, we explore the role of events in legal case retrieval and propose an unsupervised retrieval method-based pipeline U-CREAT (Unsupervised Case Retrieval using Events Extraction). We find that the proposed unsupervised retrieval method significantly increases performance compared to BM25 and makes retrieval faster by a considerable margin, making it applicable to real-time case retrieval systems. Our proposed system is generic, we show that it generalizes across two different legal systems (Indian and Canadian), and it shows state-of-the-art performance on the benchmarks for both the legal systems (IL-PCR and COLIEE corpora).
pdf
bib
abs
ISLTranslate: Dataset for Translating Indian Sign Language
Abhinav Joshi
|
Susmit Agrawal
|
Ashutosh Modi
Findings of the Association for Computational Linguistics: ACL 2023
Sign languages are the primary means of communication for many hard-of-hearing people worldwide. Recently, to bridge the communication gap between the hard-of-hearing community and the rest of the population, several sign language translation datasets have been proposed to enable the development of statistical sign language translation systems. However, there is a dearth of sign language resources for the Indian sign language. This resource paper introduces ISLTranslate, a translation dataset for continuous Indian Sign Language (ISL) consisting of 31k ISL-English sentence/phrase pairs. To the best of our knowledge, it is the largest translation dataset for continuous Indian Sign Language. We provide a detailed analysis of the dataset. To validate the performance of existing end-to-end Sign language to spoken language translation systems, we benchmark the created dataset with a transformer-based model for ISL translation.
pdf
bib
abs
SemEval-2023 Task 6: LegalEval - Understanding Legal Texts
Ashutosh Modi
|
Prathamesh Kalamkar
|
Saurabh Karn
|
Aman Tiwari
|
Abhinav Joshi
|
Sai Kiran Tanikella
|
Shouvik Kumar Guha
|
Sachin Malhan
|
Vivek Raghavan
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
In populous countries, pending legal cases have been growing exponentially. There is a need for developing NLP-based techniques for processing and automatically understanding legal documents. To promote research in the area of Legal NLP we organized the shared task LegalEval - Understanding Legal Texts at SemEval 2023. LegalEval task has three sub-tasks: Task-A (Rhetorical Roles Labeling) is about automatically structuring legal documents into semantically coherent units, Task-B (Legal Named Entity Recognition) deals with identifying relevant entities in a legal document and Task-C (Court Judgement Prediction with Explanation) explores the possibility of automatically predicting the outcome of a legal case along with providing an explanation for the prediction. In total 26 teams (approx. 100 participants spread across the world) submitted systems paper. In each of the sub-tasks, the proposed systems outperformed the baselines; however, there is a lot of scope for improvement. This paper describes the tasks, and analyzes techniques proposed by various teams.
2022
pdf
bib
abs
CISLR: Corpus for Indian Sign Language Recognition
Abhinav Joshi
|
Ashwani Bhat
|
Pradeep S
|
Priya Gole
|
Shashwat Gupta
|
Shreyansh Agarwal
|
Ashutosh Modi
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Indian Sign Language, though used by a diverse community, still lacks well-annotated resources for developing systems that would enable sign language processing. In recent years researchers have actively worked for sign languages like American Sign Languages, however, Indian Sign language is still far from data-driven tasks like machine translation. To address this gap, in this paper, we introduce a new dataset CISLR (Corpus for Indian Sign Language Recognition) for word-level recognition in Indian Sign Language using videos. The corpus has a large vocabulary of around 4700 words covering different topics and domains. Further, we propose a baseline model for word recognition from sign language videos. To handle the low resource problem in the Indian Sign Language, the proposed model consists of a prototype-based one-shot learner that leverages resource rich American Sign Language to learn generalized features for improving predictions in Indian Sign Language. Our experiments show that gesture features learned in another sign language can help perform one-shot predictions in CISLR.
pdf
bib
abs
Shapes of Emotions: Multimodal Emotion Recognition in Conversations via Emotion Shifts
Keshav Bansal
|
Harsh Agarwal
|
Abhinav Joshi
|
Ashutosh Modi
Proceedings of the First Workshop on Performance and Interpretability Evaluations of Multimodal, Multipurpose, Massive-Scale Models
Emotion Recognition in Conversations (ERC) is an important and active research area. Recent work has shown the benefits of using multiple modalities (e.g., text, audio, and video) for the ERC task. In a conversation, participants tend to maintain a particular emotional state unless some stimuli evokes a change. There is a continuous ebb and flow of emotions in a conversation. Inspired by this observation, we propose a multimodal ERC model and augment it with an emotion-shift component that improves performance. The proposed emotion-shift component is modular and can be added to any existing multimodal ERC model (with a few modifications). We experiment with different variants of the model, and results show that the inclusion of emotion shift signal helps the model to outperform existing models for ERC on MOSEI and IEMOCAP datasets.
pdf
bib
abs
COGMEN: COntextualized GNN based Multimodal Emotion recognitioN
Abhinav Joshi
|
Ashwani Bhat
|
Ayush Jain
|
Atin Singh
|
Ashutosh Modi
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Emotions are an inherent part of human interactions, and consequently, it is imperative to develop AI systems that understand and recognize human emotions. During a conversation involving various people, a person’s emotions are influenced by the other speaker’s utterances and their own emotional state over the utterances. In this paper, we propose COntextualized Graph Neural Network based Multi- modal Emotion recognitioN (COGMEN) system that leverages local information (i.e., inter/intra dependency between speakers) and global information (context). The proposed model uses Graph Neural Network (GNN) based architecture to model the complex dependencies (local and global information) in a conversation. Our model gives state-of-the- art (SOTA) results on IEMOCAP and MOSEI datasets, and detailed ablation experiments show the importance of modeling information at both levels.