LeCNet: A Legal Citation Network Benchmark Dataset

Pooja Harde; Bhavya Jain; Sarika Jain

LeCNet: A Legal Citation Network Benchmark Dataset

Abstract

Legal document analysis is pivotal in modern judicial systems, particularly for case retrieval, classification, and recommendation tasks. Graph neural networks (GNNs) have revolutionized legal use cases by enabling the efficient analysis of complex relationships. Although existing legal citation network datasets have significantly advanced research in this domain, the lack of large-scale open-source datasets tailored to the Indian judicial system has limited progress. To address this gap, we present the Indian Legal Citation Network (LeCNet) - the first open-source benchmark dataset for the link prediction task (missing citation recommendation) in the Indian judicial context. The dataset has been created by extracting information from the original judgments. LeCNet comprises 26,308 nodes representing case judgments and 67,108 edges representing citation relationships between the case nodes. Each node is described with rich features of document embeddings that incorporate contextual information from the case documents. Baseline experiments using various machine learning models were conducted for dataset validation. The Mean Reciprocal Rank (MRR) metric is used for model evaluation. The results obtained demonstrate the utility of the LeCNet dataset, highlighting the advantages of graph-based representations over purely textual models.

Anthology ID:: 2025.justnlp-main.4
Volume:: Proceedings of the 1st Workshop on NLP for Empowering Justice (JUST-NLP 2025)
Month:: December
Year:: 2025
Address:: Mumbai, India
Editors:: Ashutosh Modi, Saptarshi Ghosh, Asif Ekbal, Pawan Goyal, Sarika Jain, Abhinav Joshi, Shivani Mishra, Debtanu Datta, Shounak Paul, Kshetrimayum Boynao Singh, Sandeep Kumar
Venues:: JUSTNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 18–28
Language:
URL:: https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.justnlp-main.4/
DOI:
Bibkey:
Cite (ACL):: Pooja Harde, Bhavya Jain, and Sarika Jain. 2025. LeCNet: A Legal Citation Network Benchmark Dataset. In Proceedings of the 1st Workshop on NLP for Empowering Justice (JUST-NLP 2025), pages 18–28, Mumbai, India. Association for Computational Linguistics.
Cite (Informal):: LeCNet: A Legal Citation Network Benchmark Dataset (Harde et al., JUSTNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.justnlp-main.4.pdf

PDF Cite Search Fix data