Not ready for the bench: LLM legal interpretation is unstable and uncalibrated to human judgments
Abhishek Purushothama, Junghyun Min, Brandon Waldon, Nathan Schneider
Abstract
Legal interpretation frequently involves assessing how a legal text, as understood by an ‘ordinary’ speaker of the language, applies to the set of facts characterizing a legal dispute. Recent scholarship has proposed that legal practitioners add large language models (LLMs) to their interpretive toolkit. This work offers an empirical argument against LLM-assisted interpretation as recently practiced by legal scholars and federal judges. Our investigation in English shows that models do not provide stable interpretive judgments and are susceptible to subtle variations in the prompt. While instruction tuning slightly improves model calibration to human judgments, even the best-calibrated LLMs remain weak predictors of human native speakers’ judgments.- Anthology ID:
- 2025.nllp-1.22
- Volume:
- Proceedings of the Natural Legal Language Processing Workshop 2025
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Nikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Cătălina Goanță, Daniel Preoțiuc-Pietro, Gerasimos Spanakis
- Venues:
- NLLP | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 317
- Language:
- URL:
- https://preview.aclanthology.org/ingest-emnlp/2025.nllp-1.22/
- DOI:
- Cite (ACL):
- Abhishek Purushothama, Junghyun Min, Brandon Waldon, and Nathan Schneider. 2025. Not ready for the bench: LLM legal interpretation is unstable and uncalibrated to human judgments. In Proceedings of the Natural Legal Language Processing Workshop 2025, pages 317–317, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- Not ready for the bench: LLM legal interpretation is unstable and uncalibrated to human judgments (Purushothama et al., NLLP 2025)
- PDF:
- https://preview.aclanthology.org/ingest-emnlp/2025.nllp-1.22.pdf