Proceedings of the 1st Workshop on NLP for Empowering Justice (JUST-NLP 2025)

Ashutosh Modi, Saptarshi Ghosh, Asif Ekbal, Pawan Goyal, Sarika Jain, Abhinav Joshi, Shivani Mishra, Debtanu Datta, Shounak Paul, Kshetrimayum Boynao Singh, Sandeep Kumar (Editors)

Anthology ID:: 2025.justnlp-main
Month:: December
Year:: 2025
Address:: Mumbai, India
Venues:: JUSTNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
URL:: https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.justnlp-main/
DOI:
ISBN:: 979-8-89176-312-8
Bib Export formats:: BibTeX
PDF:: https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.justnlp-main.pdf

PDF (full) BibTeX Search

The first iteration of the JUST-NLP: Workshop on NLP for Empowering Justice was organized to accelerate research in Natural Language Processing for legal text processing. The inaugural edition, JUST-NLP 2025, was held as a hybrid event at IJCNLP-AACL 2025 on December 24 at IIT Bombay. The program featured a research track, four invited talks, and two shared tasks: (1) L-SUMM, an abstractive summarization task for Indian legal judgments, and (2) L-MT, a legal machine translation task between English and Hindi. The workshop received strong interest from the community, with 29 submissions, of which 21 were accepted. Among the accepted papers, 5 were regular research-track papers published in the proceedings, and 2 were accepted as non-archival presentations. For the shared tasks, 9 papers were accepted for L-SUMM, and 5 papers were accepted for L-MT, for publication in the proceedings. The workshop focused on a broad set of Legal NLP challenges, including information extraction, retrieval, multilingual processing, legal reasoning, and applications of large language models. Overall, JUST-NLP 2025 aimed to bring together AI researchers and legal practitioners to develop scalable, domain-aware NLP methods that can support legal workflows and contribute toward more efficient and equitable justice systems.

This paper presents an overview of the Shared Task on Summarization of Indian Court Judgments (L-SUMM), hosted by the JUST-NLP 2025 Workshop at IJCNLP-AACL 2025. This task aims to increase research interest in automatic summarization techniques for lengthy and intricate legal documents from the Indian judiciary. It particularly addresses court judgments that contain dense legal reasoning and semantic roles that must be preserved in summaries. As part of this shared task, we introduce the Indian Legal Summarization (L-SUMM) dataset, comprising 1,800 Indian court judgments paired with expert-written abstractive summaries, both in English. Therefore, the task focuses on generating high-quality abstractive summaries of court judgments in English. A total of 9 teams participated in this task, exploring a diverse range of methodologies, including transformer-based models, extractive-abstractive hybrids, graph-based ranking approaches, long-context LLMs, and rhetorical-role-based techniques. This paper describes the task setup, dataset, evaluation framework, and our findings. We report the results and highlight key trends across participant approaches, including the effectiveness of hybrid pipelines and challenges in handling extreme sequence lengths.

This paper provides an overview of the Shared Task on Legal Machine Translation (L-MT), organized as part of the JUST-NLP 2025 Workshop at IJCNLP-AACL 2025, aimed at improving the translation of legal texts, a domain where precision, structural faithfulness, and terminology preservation are essential. The training set comprises 50,000 sentences, with 5,000 sentences each for the validation and test sets. The submissions employed strategies such as: domain-adaptive fine-tuning of multilingual models, QLoRA-based parameter-efficient adaptation, curriculum-guided supervised training, reinforcement learning with verifiable MT metrics, and from-scratch Transformer training. The systems are evaluated based on BLEU, METEOR, TER, chrF++, BERTScore, and COMET metrics. We also combine the scores of these metrics to give an average score (AutoRank). The top-performing system is based on a fine-tuned distilled NLLB-200 model and achieved the highest AutoRank score of 72.1. Domain adaptation consistently yielded substantial improvements over baseline models, and precision-focused rewards proved especially effective for the legal MT. The findings also highlight that large multilingual Transformers can deliver accurate and reliable English-to-Hindi legal translations when carefully fine-tuned on legal data, advancing the broader goal of improving access to justice in multilingual settings.

pdf bib abs
LeCNet: A Legal Citation Network Benchmark Dataset
Pooja Harde | Bhavya Jain | Sarika Jain

Legal document analysis is pivotal in modern judicial systems, particularly for case retrieval, classification, and recommendation tasks. Graph neural networks (GNNs) have revolutionized legal use cases by enabling the efficient analysis of complex relationships. Although existing legal citation network datasets have significantly advanced research in this domain, the lack of large-scale open-source datasets tailored to the Indian judicial system has limited progress. To address this gap, we present the Indian Legal Citation Network (LeCNet) - the first open-source benchmark dataset for the link prediction task (missing citation recommendation) in the Indian judicial context. The dataset has been created by extracting information from the original judgments. LeCNet comprises 26,308 nodes representing case judgments and 67,108 edges representing citation relationships between the case nodes. Each node is described with rich features of document embeddings that incorporate contextual information from the case documents. Baseline experiments using various machine learning models were conducted for dataset validation. The Mean Reciprocal Rank (MRR) metric is used for model evaluation. The results obtained demonstrate the utility of the LeCNet dataset, highlighting the advantages of graph-based representations over purely textual models.

pdf bib abs
Legal Document Summarization: A Zero-shot Modular Agentic Workflow Approach
Taha Sadikot | Sarika Jain

The increasing volume and complexity of Indian High Court judgments require high-quality automated summarization systems. Our agentic workflow framework for the summarization of Indian High Court judgments achieves competitive results without model fine-tuning. Experiments on CivilSum and IN-Abs test sets report ROUGE-1 F1 up to 0.547 and BERTScore F1 up to 0.866, comparable to state-of-the-art supervised models, with advantages in transparency and efficiency. We introduce two zero-shot modular agentic workflows: Lexical Modular Summarizer (LexA), a three-stage modular architecture optimized for lexical overlap (ROUGE), and Semantic Agentic Summarizer (SemA), a five-stage integrated architecture optimized for semantic similarity (BERTScore). Both workflows operate without supervised model fine-tuning, instead relying on strategic data processing, modular agent orchestration, and carefully engineered prompts. Our framework achieves ROUGE-1 F1 of 0.6326 and BERTScore F1 of 0.8902 on CivilSum test set, and ROUGE-1 F1 of 0.1951 and BERTScore F1 of 0.8299 on IN-Abs test set, substantially outperforming zero-shot baselines, rivaling leading fine-tuned transformer models while requiring no supervised training. This work demonstrates that modular, zero-shot agentic approaches can deliver production-grade results for legal summarization, offering a new direction for resource-limited judicial settings.

pdf bib abs
LLM Driven Legal Text Analytics: A Case Study For Food Safety Violation Cases
Suyog Joshi | Soumyajit Basu | Lipika Dey | Partha Pratim Das

Despite comprehensive food safety regulations worldwide, violations continue to pose significant public health challenges. This paper presents an LLM-driven pipeline for analyzing legal texts to identify structural and procedural gaps in food safety enforcement. We develop an end-to-end system that leverages Large Language Models to extract structured entities from legal judgments, construct statute-and-provision-level knowledge graphs, and perform semantic clustering of cases. Applying our approach to 782 Indian food safety violation cases filed between 2022-2024, we uncover critical insights: 96% of cases were filed by individuals and organizations against state authorities, with 60% resulting in decisions favoring appellants. Through automated clustering and analysis, we identify major procedural lapses including unclear jurisdictional boundaries between enforcement agencies, insufficient evidence collection, and ambiguous penalty guidelines. Our findings reveal concrete weaknesses in current enforcement practices and demonstrate the practical value of LLMs for legal analysis at scale.

Access to consumer grievance redressal in India is often hindered by procedural complexity, legal jargon, and jurisdictional challenges. To address this, we present Grahak-Nyay (Justice-to-Consumers), a chatbot that streamlines the process using open-source Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG). Grahak-Nyay simplifies legal complexities through a concise and up-to-date knowledge base. We introduce three novel datasets: GeneralQA (general consumer law), SectoralQA (sector-specific knowledge) and SyntheticQA (for RAG evaluation), along with NyayChat, a dataset of 303 annotated chatbot conversations. We also introduce Judgments data sourced from Indian Consumer Courts to aid the chatbot in decision making and to enhance user trust. We also propose HAB metrics (Helpfulness, Accuracy, Brevity) to evaluate chatbot performance. Legal domain experts validated Grahak-Nyay’s effectiveness. Code and datasets are available at https://github.com/ShreyGanatra/GrahakNyay.git.

AI-based judicial assistance and case prediction have been extensively studied in criminal and civil domains, but remain largely unexplored in consumer law, especially in India. In this paper, we present Nyay-Darpan, a novel two-in-one framework that (i) summarizes consumer case files and (ii) retrieves similar case judgements to aid decision-making in consumer dispute resolution. Our methodology not only addresses the gap in consumer law AI tools, but also introduces an innovative approach to evaluate the quality of the summary. The term ‘Nyay-Darpan’ translates into ‘Mirror of Justice’, symbolizing the ability of our tool to reflect the core of consumer disputes through precise summarization and intelligent case retrieval. Our system achieves over 75 percent precision in similar case prediction and approximately 70 percent accuracy across material summary evaluation metrics, demonstrating its practical effectiveness. We will publicly release the Nyay-Darpan framework and dataset to promote reproducibility and facilitate further research in this underexplored yet impactful domain.

pdf bib abs
Cold Starts and Hard Cases: A Two-Stage SFT-RLVR Approach for Legal Machine Translation (Just-NLP L-MT shared task)
Pawitsapak Akarajaradwong | Chompakorn Chaksangchaichot

This paper details our system for the JUST-NLP 2025 Shared Task on English-to-Hindi Legal Machine Translation. We propose a novel two-stage, data-centric approach. First, we annotate the training data by translation difficulty and create easy and hard subsets.We perform SFT on the easier subset to establish a robust “cold start”. Then, we apply RLVR exclusively on the harder subset, using machine translation metrics as reward signals. This strategy allowed our system to significantly outperform strong baselines, demonstrating the capability of our systems for machine translation tasks. Source code and model weights are available at https://github.com/ppaolong/FourCorners-JustNLP-MT-Shared-Task

pdf bib abs
Contextors at L-SUMM: Retriever-Driven Multi-Generator Summarization
Pavithra Neelamegam | S Jaya Nirmala

Indian court judgments are very difficult to automatically summarize because of their length, complex legal reasoning and scattered important information. This paper outlines the methodology used for the Legal Summarization (L-SUMM) shared task at the JUST-NLP 2025 Workshop, which aims to provide abstractive summaries of roughly 500 words from english language Indian court rulings that are logical, concise and factually accurate. The paper proposes a Retriever-Driven Multi-Generator Summarization framework that combines a semantic retriever with fine-tuned encoder–decoder models BART, Pegasus and LED to enhance legal document summarization. This pipeline uses cosine similarity analysis to improve summary faithfulness, cross-model validation to guarantee factual consistency and iterative retrieval expansion to choose relevant text chunks in order to address document length and reduce hallucinations. Despite being limited to 400–500 words, the generated summaries successfully convey legal reasoning. Our team Contextors achieved an average score of 22.51, ranking 4th out of 9 in the L-SUMM shared task leaderboard, demonstrating the efficacy of Retriever-Driven Multi-Generator Summarization approach, which improves transparency, accessibility, and effective understanding of legal documents. This method shows excellent content coverage and coherence when assessed using ROUGE-2, ROUGE-L, and BLEU criteria.

pdf bib abs
A Budget Recipe for Finetuning a Long-form Legal Summarization Model
Chompakorn Chaksangchaichot | Pawitsapak Akarajaradwong

We describe an inexpensive system that ranked first in the JUST-NLP 2025 L-SUMM task, summarizing very long Indian court judgments (up to 857k characters) using a single 80GB GPU and a total budget of about $50. Our pipeline first filters out length–summary outliers, then applies two-stage LoRA SFT on Qwen3-4B-Instruct-2507 to learn style and extend context, and finally runs RLVR tuned to BLEU, ROUGE-2, and ROUGE-L, with BLEU upweighted. We showed that two-stage SFT is better than a single-stage run, and RLVR gives the largest gains, reaching 32.71 internal vs. 16.15 base and 29.91 on the test leaderboard. In ablation on prompting, we find that a simple, naive prompt converges faster but saturates earlier, while the curated legal-structured prompt keeps improving with longer training and yields higher final scores, and the finetuned model remains fairly robust to unseen prompts. Our code are fully open-sourced, available for reproducibility.

pdf bib abs
SCaLAR_NITK @ JUSTNLP Legal Summarization (L-SUMM) Shared Task
Arjun T D | Anand Kumar Madasamy

This paper presents the systems we submitted to the JUST-NLP 2025 Shared Task on Legal Summarization (L-SUMM). Creating abstractive summaries of lengthy Indian court rulings is challenging due to transformer token limits. To address this problem, we compare three systems built on a fine-tuned Legal Pegasus model. System 1 (Baseline) applies a standard hierarchical framework that chunks long documents using naive token-based segmentation. System 2 (RR-Chunk) improves this approach by using a BERT-BiLSTM model to tag sentences with rhetorical roles (RR) and incorporating these tags (e.g., [Facts]. . . ) to enable structurally informed chunking for hierarchical summarization. System 3 (WRR-Tune) tests whether explicit importance cues help the model by assigning importance scores to each RR using the geometric mean of their distributional presence in judgments and human summaries, and finetuning a separate model on text augmented with these tags (e.g., [Facts, importance score 13.58]). A comparison of the three systems demonstrates the value of progressively adding structural and quantitative importance signals to the model’s input.

pdf bib abs
goodmen @ L-MT Shared Task: A Comparative Study of Neural Models for English-Hindi Legal Machine Translation
Deeraj S K | Karthik Suryanarayanan | Yash Ingle | Pruthwik Mishra

In a massively multilingual country like India,providing legal judgments in understandablenative languages is essential for equitable jus-tice to all. The Legal Machine Translation(L-MT) shared task focuses on translating le-gal content from English to Hindi which is themost spoken language in India. We present acomprehensive evaluation of neural machinetranslation models for English-Hindi legal doc-ument translation, developed as part of the L-MT shared task. We investigate four multi-lingual and Indic focused translation systems.Our approach emphasizes domain specific fine-tuning on legal corpus while preserving statu-tory structure, legal citations, and jurisdic-tional terminology. We fine-tune two legalfocused translation models, InLegalTrans andIndicTrans2 on the English-Hindi legal paral-lel corpus provided by the organizers wherethe use of any external data is constrained.The fine-tuned InLegalTrans model achievesthe highest BLEU score of 0.48. Compara-tive analysis reveals that domain adaptationthrough fine-tuning on legal corpora signifi-cantly enhances translation quality for special-ized legal texts. Human evaluation confirmssuperior coherence and judicial tone preserva-tion in InLegalTrans outputs. Our best per-forming model is ranked 3rd on the test data.

pdf bib abs
NIT-Surat@L-Sum: A Semantic Retrieval-Based Framework for Summarizing Indian Judicial Documents
Nita Jadav | Ashok Urlana | Pruthwik Mishra

The shared task of Legal Summarization (L-Summ) focuses on generating abstractive summaries for the Indian court judgments in English. This task presents unique challenges in producing fluent, relevant, and legally appropriate summaries given voluminous judgment texts. We experiment with different sequence-to-sequence models and present a comprehensive comparative study of their performance. We also evaluate various Large Language Models (LLM) with zero-shot settings for testing their summarization capabilities. Our best performing model is fine-tuned on a pre-trained legal summarization model where relevant passages are identified using the maximum marginal relevance(MMR) technique. Our findings highlight that retrieval-augmented fine-tuning is an effective approach for generating precise and concise legal summaries. We obtained a rank of 5th overall.

pdf bib abs
Adapting IndicTrans2 for Legal Domain MT via QLoRA Fine-Tuning at JUST-NLP 2025
Akoijam Jenil Singh | Loitongbam Sanayai Meetei | Yumnam Surajkanta

Machine Translation (MT) in the legal domain presents substantial challenges due to its complex terminology, lengthy statutes, and rigid syntactic structures. The JUST-NLP 2025 Shared Task on Legal Machine Translation was organized to advance research on domain-specific MT systems for legal texts. In this work, we propose a fine-tuned version of the pretrained large language model (LLM) ai4bharat/indictrans2-en-indic-1B, a transformer-based English-to-Indic translation model. Fine-tuning was performed using the parallel corpus provided by the JUST-NLP 2025 Shared Task organizers.Our adapted model demonstrates notable improvements over the baseline system, particularly in handling domain-specific legal terminology and complex syntactic constructions. In automatic evaluation, our system obtained BLEU = 46.67 and chrF = 70.03.In human evaluation, it achieved adequacy = 4.085 and fluency = 4.006. Our approach achieved an AutoRank score of 58.79, highlighting the effectiveness of domain adaptation through fine-tuning for legal machine translation.

pdf bib abs
Team-SVNIT at JUST-NLP 2025: Domain-Adaptive Fine-Tuning of Multilingual Models for English–Hindi Legal Machine Translation
Rupesh Dhakad | Naveen Kumar | Shrikant Malviya

Translating the sentences between English and Hindi is challenging, especially in the domain of legal documents. The major reason behind the complexity is specialized legal terminology, long and complex sentences, and the accuracy constraint. This paper presents a system developed by Team-SVNIT for the JUST-NLP 2025 shared task on legal machine translation. We fine-tune and compare multiple pretrained multilingual translation models, including the facebook/nllb-200-distilled-1.3B, on a corpus of 50,000 English–Hindi legal sentence pairs provided for the shared task. The training pipeline includes preprocessing, context windows of 512 tokens, and decoding methods to enhance translation quality. The proposed method secured 1st place on the official leaderboard with the AutoRank score of 61.62. We obtained the following scores on various metrics: BLEU 51.61, METEOR 75.80, TER 37.09, CHRF++ 73.29, BERTScore 92.61, and COMET 76.36. These results demonstrate that fine-tuning multilingual models for a domain-specific machine translation task enhances performance. It works better than general multilingual translation systems.

pdf bib abs
Combining Extractive and Generative Methods for Legal Summarization: BLANCKED at JUST-NLP 2025
Erich Giusseppe Soto Parada | Carlos Manuel Muñoz Almeida | David Cuevas Alba

This paper presents Tayronas Trigrams’s methodology and findings from our participation in the JUST-NLP 2025 Shared Task of Legal Summarization (L-SUMM), which focused on generating abstractive summaries of lengthy Indian court judgments. Our initial approach involved evaluating and fine-tuning specialized sequence-to-sequence models like Legal-Pegasus, Indian Legal LED, and BART. We found that these small generative models, even after fine-tuning on the limited InLSum dataset (1,200 training examples), delivered performance (e.g., Legal-Pegasus AVG score: 16.50) significantly below expected.Consequently, our final, best-performing method was a hybrid extractive-abstractive pipeline. This approach first employed the extractive method PACSUM to select the most important sentences yielding an initial AVG score of 20.04 and then utilized a Large Language Model (specifically Gemini 2.5 Pro), correctly prompted, to perform the final abstractive step by seamlessly stitching and ensuring coherence between these extracted chunks. This hybrid strategy achieved an average ROUGE-2 of 21.05, ROUGE-L of 24.35, and BLEU of 15.12, securing 7th place in the competition. Our key finding is that, under data scarcity, a two-stage hybrid approach dramatically outperforms end-to-end abstractive fine-tuning on smaller models.

pdf bib abs
Automatic Legal Judgment Summarization Using Large Language Models: A Case Study for the JUST-NLP 2025 Shared Task
Santiago Chica

This paper presents the proposal developed for the JUST-NLP 2025 Shared Task on Legal Summarization, which aims to generate abstractive summaries of Indian court judgments. The work describes the motivation, dataset analysis, related work, and proposed methodology based on Large Language Models (LLMs). We analyze the Indian Legal Summarization (InLSum) dataset, review four relevant articles in the summarization of legal texts, and describe the experimental setup involving GPT-4.1 to evaluate the effectiveness of different prompting strategies. The evaluation will follow the ROUGE and BLEU metrics, consistent with the competition protocol.

pdf bib abs
Structure-Aware Chunking for Abstractive Summarization of Long Legal Documents
Himadri Sonowal | Saisab Sadhu

The efficacy of state-of-the-art abstractive summarization models is severely constrained by the extreme document lengths of legal judgments, which consistently surpass their fixed input capacities. The prevailing method, naive sequential chunking, is a discourse-agnostic process that induces context fragmentation and degrades summary coherence. This paper introduces Structure-Aware Chunking (SAC), a rhetorically-informed pre-processing pipeline that leverages the intrinsic logical structure of legal documents. We partition judgments into their constituent rhetorical strata—Facts, Arguments & Analysis, and Conclusion—prior to the summarization pass. We present and evaluate two SAC instantiations: a computationally efficient heuristic-based segmenter and a semantically robust LLM-driven approach. Empirical validation on the JUST-NLP 2025 L-SUMM shared task dataset reveals a nuanced trade-off: while our methods improve local, n-gram based metrics (ROUGE-2), they struggle to maintain global coherence (ROUGE-L). We identify this “coherence gap” as a critical challenge in chunk-based summarization and show that advanced LLM-based segmentation begins to bridge it.

pdf bib abs
From Scratch to Fine-Tuned: A Comparative Study of Transformer Training Strategies for Legal Machine Translation
Amit Barman | Atanu Mandal | Sudip Kumar Naskar

In multilingual nations like India, access to legal information is often hindered by language barriers, as much of the legal and judicial documentation remains in English. Legal Machine Translation (L-MT) offers a scalable solution to this challenge by enabling accurate and accessible translations of legal documents. This paper presents our work for the JUST-NLP 2025 Legal MT shared task, focusing on English–Hindi translation using Transformer-based approaches. We experiment with 2 complementary strategies, fine-tuning a pre-trained OPUS-MT model for domain-specific adaptation and training a Transformer model from scratch using the provided legal corpus. Performance is evaluated using standard MT metrics, including SacreBLEU, chrF++, TER, ROUGE, BERTScore, METEOR, and COMET. Our fine-tuned OPUS-MT model achieves a SacreBLEU score of 46.03, significantly outperforming both baseline and from-scratch models. The results highlight the effectiveness of domain adaptation in enhancing translation quality and demonstrate the potential of L-MT systems to improve access to justice and legal transparency in multilingual contexts.

pdf bib abs
Integrating Graph based Algorithm and Transformer Models for Abstractive Summarization
Sayed Ayaan Ahmed Sha | Sangeetha Sivanesan | Anand Kumar Madasamy | Navya Binu

Summarizing legal documents is a challenging and critical task in the field of Natural Language Processing(NLP). On top of that generating abstractive summaries for legal judgments poses a significant challenge to researchers as there is limitation in the number of input tokens for various language models. In this paper we experimented with two models namely BART base model finetuned on CNN DailyMail dataset along with TextRank and pegasus_indian_legal, a finetuned version of legal-pegasus on Indian legal judgments for generating abstractive summaries for Indian legal documents as part of the JUSTNLP 2025 - Shared Task on Legal Summarization. BART+TextRank outperformed pegasus_indian_legal with a score of 18.84.

pdf bib abs
Hierarchical Long-Document Summarization using LED for Legal Judgments
Reshma Sheik | Noah John Puthayathu | Fathima Firose A | Jonathan Paul

This paper describes our system for the L-SUMM shared task on legal document summarization. Our approach is built on the Longformer Encoder-Decoder (LED) model, which we augment with a multi-level summarization strategy tailored for legal documents that are substantially longer than typical transformer input limits. The system achieved competitive performance on the legal judgment summarization task through optimized training strategies, including gradient accumulation, Adafactor optimization, and hyperparameter tuning. Our findings indicate that combining hierarchical processing with strategically assigned global attention enables more reliable summarization of lengthy legal texts.