Validate Your Authority: Benchmarking LLMs on Multi-Label Precedent Treatment Classification

M. Mikail Demir; M Abdullah Canbaz

doi:10.18653/v1/2025.nllp-1.13

Validate Your Authority: Benchmarking LLMs on Multi-Label Precedent Treatment Classification

Abstract

Automating the classification of negative treatment in legal precedent is a critical yet nuanced NLP task where misclassification carries significant risk. To address the shortcomings of standard accuracy, this paper introduces a more robust evaluation framework. We benchmark modern Large Language Models on a new, expert-annotated dataset of 239 real-world legal citations and propose a novel Average Severity Error metric to better measure the practical impact of classification errors. Our experiments reveal a performance split: Google’s Gemini 2.5 Flash achieved the highest accuracy on a high-level classification task (79.1%), while OpenAI’s GPT-5-mini was the top performer on the more complex fine-grained schema (67.7%). This work establishes a crucial baseline, provides a new context-rich dataset, and introduces an evaluation metric tailored to the demands of this complex legal reasoning task.

Anthology ID:: 2025.nllp-1.13
Volume:: Proceedings of the Natural Legal Language Processing Workshop 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Nikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Cătălina Goanță, Daniel Preoțiuc-Pietro, Gerasimos Spanakis
Venues:: NLLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 172–183
Language:
URL:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.nllp-1.13/
DOI:: 10.18653/v1/2025.nllp-1.13
Bibkey:
Cite (ACL):: M. Mikail Demir and M Abdullah Canbaz. 2025. Validate Your Authority: Benchmarking LLMs on Multi-Label Precedent Treatment Classification. In Proceedings of the Natural Legal Language Processing Workshop 2025, pages 172–183, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Validate Your Authority: Benchmarking LLMs on Multi-Label Precedent Treatment Classification (Demir & Canbaz, NLLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.nllp-1.13.pdf

PDF Cite Search Fix data