KoLegalQA: A Korean Legal QA Dataset for Trustworthy and Explanation-Grounded Legal AI

Yongtae Lee, Surin Lee, Sumin Kim, S M Wahidur Rahman, Heung-No Lee


Abstract
Legal QA systems may benefit from training data that is expert-verified and associated with statutory provisions, as fluent generation alone cannot guarantee legally relevant and citation-supported outputs. However, existing Korean legal datasets provide limited support for legal QA and statute-associated response generation. To address this gap, we introduce KoLegalQA, a large-scale Korean legal question–answer corpus designed for research on legal QA and explanation-oriented legal response generation in real-world consultation scenarios. The dataset comprises 19k consultations collected from government-operated services, with all responses originally authored or verified by licensed legal professionals. Unlike prior resources, KoLegalQA provides explicit statutory references and clause-level summaries, enabling research on citation-associated and explanation-oriented legal response generation. We benchmark six Korean-capable LLMs using both automated evaluation (G-Eval) and human assessment across multiple criteria, including legal correctness, reasoning quality, and citation relevance. Experimental results show that fine-tuning on KoLegalQA generally improves legal reasoning validity and statute-associated response generation across most evaluated models. We present this resource as a practical benchmark dataset for Korean legal NLP research. Dataset splits, preprocessing scripts, and evaluation code will be publicly released to support reproducible research.
Anthology ID:
2026.trustnlp-main.13
Volume:
Proceedings of the 6th Workshop on Trustworthy NLP (TrustNLP 2026)
Month:
July
Year:
2026
Address:
San Diego, California
Editors:
Kai-Wei Chang, Ninareh Mehrabi, Satyapriya Krishna, Anubrata Das, Jwala Dhamala, Yang Trista Cao, Tharindu Kumarage, Anil Ramakrishna, Christos Christodoulopoulos, Yixin Wan, Aram Galystan, Anoop Kumar, Rahul Gupta
Venues:
TrustNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
240–255
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.trustnlp-main.13/
DOI:
Bibkey:
Cite (ACL):
Yongtae Lee, Surin Lee, Sumin Kim, S M Wahidur Rahman, and Heung-No Lee. 2026. KoLegalQA: A Korean Legal QA Dataset for Trustworthy and Explanation-Grounded Legal AI. In Proceedings of the 6th Workshop on Trustworthy NLP (TrustNLP 2026), pages 240–255, San Diego, California. Association for Computational Linguistics.
Cite (Informal):
KoLegalQA: A Korean Legal QA Dataset for Trustworthy and Explanation-Grounded Legal AI (Lee et al., TrustNLP 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.trustnlp-main.13.pdf