BLT: Can Large Language Models Handle Basic Legal Text?

Andrew Blair-Stanek; Nils Holzenberger; Benjamin Van Durme

doi:10.18653/v1/2024.nllp-1.18

BLT: Can Large Language Models Handle Basic Legal Text?

Andrew Blair-Stanek, Nils Holzenberger, Benjamin Van Durme

Abstract

We find that the best publicly available LLMs like GPT-4 and Claude currently perform poorly on basic legal text handling. This motivates the creation of a benchmark consisting of examples that lawyers and paralegals would expect LLMs to handle zero-shot, such as looking up the text at a line of a witness deposition or at a subsection of a contract. LLMs’ poor performance on this benchmark casts into doubt their reliability as-is for legal practice. However, fine-tuning on our training set brings even a small model to near-perfect performance. This benchmark will be useful for fine-tuning LLMs for downstream legal tasks, as well as for tracking LLMs’ reliability as-is for basic legal tasks.

Anthology ID:: 2024.nllp-1.18
Volume:: Proceedings of the Natural Legal Language Processing Workshop 2024
Month:: November
Year:: 2024
Address:: Miami, FL, USA
Editors:: Nikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Cătălina Goanță, Daniel Preoțiuc-Pietro, Gerasimos Spanakis
Venue:: NLLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 216–232
Language:
URL:: https://preview.aclanthology.org/add-emnlp-2024-awards/2024.nllp-1.18/
DOI:: 10.18653/v1/2024.nllp-1.18
Bibkey:
Cite (ACL):: Andrew Blair-Stanek, Nils Holzenberger, and Benjamin Van Durme. 2024. BLT: Can Large Language Models Handle Basic Legal Text?. In Proceedings of the Natural Legal Language Processing Workshop 2024, pages 216–232, Miami, FL, USA. Association for Computational Linguistics.
Cite (Informal):: BLT: Can Large Language Models Handle Basic Legal Text? (Blair-Stanek et al., NLLP 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/add-emnlp-2024-awards/2024.nllp-1.18.pdf

PDF Cite Search Fix data