KoLegalQA: A Korean Legal QA Dataset for Trustworthy and Explanation-Grounded Legal AI

Yongtae Lee; Surin Lee; Sumin Kim; S M Wahidur Rahman; Heung-No Lee

KoLegalQA: A Korean Legal QA Dataset for Trustworthy and Explanation-Grounded Legal AI

Yongtae Lee, Surin Lee, Sumin Kim, S M Wahidur Rahman, Heung-No Lee

Abstract

Legal QA systems may benefit from training data that is expert-verified and associated with statutory provisions, as fluent generation alone cannot guarantee legally relevant and citation-supported outputs. However, existing Korean legal datasets provide limited support for legal QA and statute-associated response generation. To address this gap, we introduce KoLegalQA, a large-scale Korean legal question–answer corpus designed for research on legal QA and explanation-oriented legal response generation in real-world consultation scenarios. The dataset comprises 19k consultations collected from government-operated services, with all responses originally authored or verified by licensed legal professionals. Unlike prior resources, KoLegalQA provides explicit statutory references and clause-level summaries, enabling research on citation-associated and explanation-oriented legal response generation. We benchmark six Korean-capable LLMs using both automated evaluation (G-Eval) and human assessment across multiple criteria, including legal correctness, reasoning quality, and citation relevance. Experimental results show that fine-tuning on KoLegalQA generally improves legal reasoning validity and statute-associated response generation across most evaluated models. We present this resource as a practical benchmark dataset for Korean legal NLP research. Dataset splits, preprocessing scripts, and evaluation code will be publicly released to support reproducible research.

Anthology ID:: 2026.trustnlp-main.13
Volume:: Proceedings of the 6th Workshop on Trustworthy NLP (TrustNLP 2026)
Month:: July
Year:: 2026
Address:: San Diego, California
Editors:: Kai-Wei Chang, Ninareh Mehrabi, Satyapriya Krishna, Anubrata Das, Jwala Dhamala, Yang Trista Cao, Tharindu Kumarage, Anil Ramakrishna, Christos Christodoulopoulos, Yixin Wan, Aram Galystan, Anoop Kumar, Rahul Gupta
Venues:: TrustNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 240–255
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.trustnlp-main.13/
DOI:
Bibkey:
Cite (ACL):: Yongtae Lee, Surin Lee, Sumin Kim, S M Wahidur Rahman, and Heung-No Lee. 2026. KoLegalQA: A Korean Legal QA Dataset for Trustworthy and Explanation-Grounded Legal AI. In Proceedings of the 6th Workshop on Trustworthy NLP (TrustNLP 2026), pages 240–255, San Diego, California. Association for Computational Linguistics.
Cite (Informal):: KoLegalQA: A Korean Legal QA Dataset for Trustworthy and Explanation-Grounded Legal AI (Lee et al., TrustNLP 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.trustnlp-main.13.pdf

PDF Cite Search Fix data