Unpacking Legal Reasoning in LLMs: Chain-of-Thought as a Key to Human-Machine Alignment in Essay-Based NLU Tasks

Yu Ying Chu; Sieh-chuen Huang; Hsuan-Lei Shao

Unpacking Legal Reasoning in LLMs: Chain-of-Thought as a Key to Human-Machine Alignment in Essay-Based NLU Tasks

Yu Ying Chu, Sieh-chuen Huang, Hsuan-Lei Shao

Abstract

This study evaluates how Large Language Models (LLMs) perform deep legal reasoning on Taiwanese Status Law questions and investigates how Chain-of-Thought (CoT) prompting affects interpretability, alignment, and generalization. Using a two-stage evaluation framework, we first decomposed six real legal essay questions into 68 sub-questions covering issue spotting, statutory application, and inheritance computation. In Stage Two, full-length answers were collected under baseline and CoT-prompted conditions. Four LLMs—ChatGPT-4o, Gemini, Grok3, and Copilot—were tested. Results show CoT prompting significantly improved accuracy for Gemini (from 83.2% to 94.5%, p < 0.05) and Grok3, with moderate but consistent gains for ChatGPT and Copilot. Human evaluation of full-length responses revealed CoT answers received notably higher scores in issue coverage and reasoning clarity, with ChatGPT and Gemini gaining +2.67 and +1.92 points respectively. Despite these gains, legal misclassifications persist, highlighting alignment gaps between surface-level fluency and expert legal reasoning. This work opens the black box of legal NLU by tracing LLM reasoning chains, quantifying performance shifts under structured prompting, and providing a diagnostic benchmark for complex, open-ended legal tasks beyond multiple-choice settings.

Anthology ID:: 2025.naloma-1.1
Volume:: Proceedings of the 5th Workshop on Natural Logic Meets Machine Learning (NALOMA)
Month:: August
Year:: 2025
Address:: Bochum, Germany
Editors:: Lasha Abzianidze, Valeria de Paiva
Venues:: NALOMA | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1–7
Language:
URL:: https://preview.aclanthology.org/gwc-25-ingestion/2025.naloma-1.1/
DOI:
Bibkey:
Cite (ACL):: Yu Ying Chu, Sieh-chuen Huang, and Hsuan-Lei Shao. 2025. Unpacking Legal Reasoning in LLMs: Chain-of-Thought as a Key to Human-Machine Alignment in Essay-Based NLU Tasks. In Proceedings of the 5th Workshop on Natural Logic Meets Machine Learning (NALOMA), pages 1–7, Bochum, Germany. Association for Computational Linguistics.
Cite (Informal):: Unpacking Legal Reasoning in LLMs: Chain-of-Thought as a Key to Human-Machine Alignment in Essay-Based NLU Tasks (Chu et al., NALOMA 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/gwc-25-ingestion/2025.naloma-1.1.pdf

PDF Cite Search Fix data