Abstract
We find that the best publicly available LLMs like GPT-4 and Claude currently perform poorly on basic legal text handling. This motivates the creation of a benchmark consisting of examples that lawyers and paralegals would expect LLMs to handle zero-shot, such as looking up the text at a line of a witness deposition or at a subsection of a contract. LLMs’ poor performance on this benchmark casts into doubt their reliability as-is for legal practice. However, fine-tuning on our training set brings even a small model to near-perfect performance. This benchmark will be useful for fine-tuning LLMs for downstream legal tasks, as well as for tracking LLMs’ reliability as-is for basic legal tasks.- Anthology ID:
- 2024.nllp-1.18
- Volume:
- Proceedings of the Natural Legal Language Processing Workshop 2024
- Month:
- November
- Year:
- 2024
- Address:
- Miami, FL, USA
- Editors:
- Nikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Cătălina Goanță, Daniel Preoțiuc-Pietro, Gerasimos Spanakis
- Venue:
- NLLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 216–232
- Language:
- URL:
- https://aclanthology.org/2024.nllp-1.18
- DOI:
- 10.18653/v1/2024.nllp-1.18
- Cite (ACL):
- Andrew Blair-Stanek, Nils Holzenberger, and Benjamin Van Durme. 2024. BLT: Can Large Language Models Handle Basic Legal Text?. In Proceedings of the Natural Legal Language Processing Workshop 2024, pages 216–232, Miami, FL, USA. Association for Computational Linguistics.
- Cite (Informal):
- BLT: Can Large Language Models Handle Basic Legal Text? (Blair-Stanek et al., NLLP 2024)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2024.nllp-1.18.pdf