Translating Tax Law to Code with LLMs: A Benchmark and Evaluation Framework

Gabriele Lorenzo, Aldo Pietromatera, Nils Holzenberger


Abstract
Catala is a domain-specific programming language for tax law, meant to facilitate the translation of legal text into executable computer code, thanks to a syntax close to that of legal language and reasoning. Legal statutes paired with their Catala translation have been published online periodically, but manual translation remains labor-intensive. In this work, we develop a benchmark for the evaluation of Catala code generation from legal text, including a training set to fine-tune Large Language Models. To assess the quality of the generated code, we introduce an evaluation framework extending current metrics for code generation. Our experiments with few-shot learning, as well as fine-tuned models, suggest the feasibility of automating legal code generation, and contrast with prior attempts to translate legal language into a formal representation.
Anthology ID:
2025.nllp-1.4
Volume:
Proceedings of the Natural Legal Language Processing Workshop 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Nikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Cătălina Goanță, Daniel Preoțiuc-Pietro, Gerasimos Spanakis
Venues:
NLLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
31–47
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.nllp-1.4/
DOI:
Bibkey:
Cite (ACL):
Gabriele Lorenzo, Aldo Pietromatera, and Nils Holzenberger. 2025. Translating Tax Law to Code with LLMs: A Benchmark and Evaluation Framework. In Proceedings of the Natural Legal Language Processing Workshop 2025, pages 31–47, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Translating Tax Law to Code with LLMs: A Benchmark and Evaluation Framework (Lorenzo et al., NLLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.nllp-1.4.pdf