Gaps or Hallucinations? Scrutinizing Machine-Generated Legal Analysis for Fine-grained Text Evaluations

Abe Hou; William Jurayj; Nils Holzenberger; Andrew Blair-Stanek; Benjamin Van Durme

doi:10.18653/v1/2024.nllp-1.24

Gaps or Hallucinations? Scrutinizing Machine-Generated Legal Analysis for Fine-grained Text Evaluations

Abe Hou, William Jurayj, Nils Holzenberger, Andrew Blair-Stanek, Benjamin Van Durme

Abstract

Large Language Models (LLMs) show promise as a writing aid for professionals performing legal analyses. However, LLMs can often hallucinate in this setting, in ways difficult to recognize by non-professionals and existing text evaluation metrics. In this work, we pose the question: when can machine-generated legal analysis be evaluated as acceptable? We introduce the neutral notion of gaps – as opposed to hallucinations in a strict erroneous sense – to refer to the difference between human-written and machine-generated legal analysis. Gaps do not always equate to invalid generation. Working with legal experts, we consider the CLERC generation task proposed in Hou et al. (2024b), leading to a taxonomy, a fine-grained detector for predicting gap categories, and an annotated dataset for automatic evaluation. Our best detector achieves 67% F1 score and 80% precision on the test set. Employing this detector as an automated metric on legal analysis generated by SOTA LLMs, we find around 80% contain hallucinations of different kinds.

Anthology ID:: 2024.nllp-1.24
Volume:: Proceedings of the Natural Legal Language Processing Workshop 2024
Month:: November
Year:: 2024
Address:: Miami, FL, USA
Editors:: Nikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Cătălina Goanță, Daniel Preoțiuc-Pietro, Gerasimos Spanakis
Venues:: NLLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 280–302
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2024.nllp-1.24/
DOI:: 10.18653/v1/2024.nllp-1.24
Bibkey:
Cite (ACL):: Abe Hou, William Jurayj, Nils Holzenberger, Andrew Blair-Stanek, and Benjamin Van Durme. 2024. Gaps or Hallucinations? Scrutinizing Machine-Generated Legal Analysis for Fine-grained Text Evaluations. In Proceedings of the Natural Legal Language Processing Workshop 2024, pages 280–302, Miami, FL, USA. Association for Computational Linguistics.
Cite (Informal):: Gaps or Hallucinations? Scrutinizing Machine-Generated Legal Analysis for Fine-grained Text Evaluations (Hou et al., NLLP 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2024.nllp-1.24.pdf

PDF Cite Search Fix data