Kevin Chenhao Li

2026

RegTrack: A Fine-Grained Benchmark for Multi-Class Legal Change Detection
Joe Yu | Kevin Chenhao Li | Julian Ostarek
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)

Organizations must continuously monitor evolving regulations to maintain compliance. While current tools are limited to surface-level text comparison, existing models lack the finegrained classification schemes to determine whether small changes impact legal obligations or merely update formatting. To address this gap, we introduce a novel benchmark for change detection in EU regulations. It comprises 4,772 manually annotated pairs of structurally distinct provisions, defined as Atomic Legal Units (ALUs), mapped to a six-class taxonomy of legal change types. We formalize three core tasks: structural alignment, change classification, and a combined task requiring simultaneous alignment and classification. Evaluating lexical algorithms, dense encoders, and Large Language Models (LLMs) as baselines, we find LLMs excel at isolated change classification, whereas domain-specific dense encoders offer the most robust combined performance. By providing fine-grained labeled data, this benchmark enables the development of AI systems that can help organizations analyze regulatory shifts and support version-aware retrieval in the legal domain.

2025

pdf bib abs

Efficient Prompt Optimisation for Legal Text Classification with Proxy Prompt Evaluator
Hyunji Lee | Kevin Chenhao Li | Matthias Grabmair | Shanshan Xu
Proceedings of the Natural Legal Language Processing Workshop 2025

Prompt optimization aims to systematically refine prompts to enhance a language model’s performance on specific tasks. Fairness detection in Terms of Service (ToS) clauses is a challenging legal NLP task that demands carefully crafted prompts to ensure reliable results. However, existing prompt optimization methods are often computationally expensive due to inefficient search strategies and costly prompt candidate scoring. In this paper, we propose a framework that combines Monte Carlo Tree Search (MCTS) with a proxy prompt evaluator to more effectively explore the prompt space while reducing evaluation costs. Experiments demonstrate that our approach achieves higher classification accuracy and efficiency than baseline methods under a constrained computation budget.

Co-authors

Venues

Fix author