Lena Held

2025

pdf bib abs
Contemporary LLMs struggle with extracting formal legal arguments
Lena Held | Ivan Habernal
Proceedings of the Natural Legal Language Processing Workshop 2025

Legal Argument Mining (LAM) is a complex challenge for humans and language models alike. This paper explores the application of Large Language Models (LLMs) in LAM, focusing on the identification of fine-grained argument types within judgment texts. We compare the performance of Flan-T5 and Llama 3 models against a baseline RoBERTa model to study if the advantages of magnitude-bigger LLMs can be leveraged for this task. Our study investigates the effectiveness of fine-tuning and prompting strategies in enhancing the models’ ability to discern nuanced argument types. Despite employing state-of-the-art techniques, our findings indicate that neither fine-tuning nor prompting could surpass the performance of a domain-pre-trained encoder-only model. This highlights the challenges and limitations in adapting general-purpose large language models to the specialized domain of legal argumentation. The insights gained from this research contribute to the ongoing discourse on optimizing NLP models for complex, domain-specific tasks. Our code and data for reproducibility are available at https://github.com/trusthlt/legal-argument-spans.

2024

pdf bib abs
SemEval-2024 Task 5: Argument Reasoning in Civil Procedure
Lena Held | Ivan Habernal
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

This paper describes the results of SemEval-2024 Task 5: Argument Reasoning in Civil Procedure, consisting of a single task on judging and reasoning about the answers to questions in U.S. civil procedure. The dataset for this task contains question, answer and explanation pairs taken from The Glannon Guide To Civil Procedure (Glannon, 2018). The task was to classify in a binary manner if the answer is a correct choice for the question or not. Twenty participants submitted their solutions, with the best results achieving a remarkable 82.31% F1-score. We summarize and analyze the results from all participating systems and provide an overview over the systems of 14 participants.

2022

pdf bib abs
The Legal Argument Reasoning Task in Civil Procedure
Leonard Bongard | Lena Held | Ivan Habernal
Proceedings of the Natural Legal Language Processing Workshop 2022

We present a new NLP task and dataset from the domain of the U.S. civil procedure. Each instance of the dataset consists of a general introduction to the case, a particular question, and a possible solution argument, accompanied by a detailed analysis of why the argument applies in that case. Since the dataset is based on a book aimed at law students, we believe that it represents a truly complex task for benchmarking modern legal language models. Our baseline evaluation shows that fine-tuning a legal transformer provides some advantage over random baseline models, but our analysis reveals that the actual ability to infer legal arguments remains a challenging open research question.

Co-authors

Ivan Habernal 3
Leonard Bongard 1

Venues

Fix author