KoLegalQA: A Korean Legal QA Dataset for Trustworthy and Explanation-Grounded Legal AI
Yongtae Lee, Surin Lee, Sumin Kim, S M Wahidur Rahman, Heung-No Lee
Abstract
Legal QA systems may benefit from training data that is expert-verified and associated with statutory provisions, as fluent generation alone cannot guarantee legally relevant and citation-supported outputs. However, existing Korean legal datasets provide limited support for legal QA and statute-associated response generation. To address this gap, we introduce KoLegalQA, a large-scale Korean legal question–answer corpus designed for research on legal QA and explanation-oriented legal response generation in real-world consultation scenarios. The dataset comprises 19k consultations collected from government-operated services, with all responses originally authored or verified by licensed legal professionals. Unlike prior resources, KoLegalQA provides explicit statutory references and clause-level summaries, enabling research on citation-associated and explanation-oriented legal response generation. We benchmark six Korean-capable LLMs using both automated evaluation (G-Eval) and human assessment across multiple criteria, including legal correctness, reasoning quality, and citation relevance. Experimental results show that fine-tuning on KoLegalQA generally improves legal reasoning validity and statute-associated response generation across most evaluated models. We present this resource as a practical benchmark dataset for Korean legal NLP research. Dataset splits, preprocessing scripts, and evaluation code will be publicly released to support reproducible research.- Anthology ID:
- 2026.trustnlp-main.13
- Volume:
- Proceedings of the 6th Workshop on Trustworthy NLP (TrustNLP 2026)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California
- Editors:
- Kai-Wei Chang, Ninareh Mehrabi, Satyapriya Krishna, Anubrata Das, Jwala Dhamala, Yang Trista Cao, Tharindu Kumarage, Anil Ramakrishna, Christos Christodoulopoulos, Yixin Wan, Aram Galystan, Anoop Kumar, Rahul Gupta
- Venues:
- TrustNLP | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 240–255
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.trustnlp-main.13/
- DOI:
- Cite (ACL):
- Yongtae Lee, Surin Lee, Sumin Kim, S M Wahidur Rahman, and Heung-No Lee. 2026. KoLegalQA: A Korean Legal QA Dataset for Trustworthy and Explanation-Grounded Legal AI. In Proceedings of the 6th Workshop on Trustworthy NLP (TrustNLP 2026), pages 240–255, San Diego, California. Association for Computational Linguistics.
- Cite (Informal):
- KoLegalQA: A Korean Legal QA Dataset for Trustworthy and Explanation-Grounded Legal AI (Lee et al., TrustNLP 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.trustnlp-main.13.pdf