Thunder-DeID: Accurate and Efficient De-identification Framework for Korean Court Judgments

Sungeun Hahm, Heejin Kim, Gyuseong Lee, Hyunji M. Park, Jaejin Lee


Abstract
To ensure a balance between open access to justice and personal data protection, the South Korean judiciary mandates the de-identification of court judgments before they can be publicly disclosed. However, the current de-identification process is inadequate for handling court judgments at scale while adhering to strict legal requirements. Additionally, the legal definitions and categorizations of personal identifiers are vague and not well-suited for technical solutions. To tackle these challenges, we propose a de-identification framework called Thunder-DeID, which aligns with relevant laws and practices. Specifically, we (i) construct and release the first Korean legal dataset containing annotated judgments along with corresponding lists of entity mentions, (ii) introduce a systematic categorization of Personally Identifiable Information (PII), and (iii) develop an end-to-end deep neural network (DNN)-based de-identification pipeline. Our experimental results demonstrate that our model achieves state-of-the-art performance in the de-identification of court judgments.
Anthology ID:
2025.findings-emnlp.682
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12728–12755
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.682/
DOI:
10.18653/v1/2025.findings-emnlp.682
Bibkey:
Cite (ACL):
Sungeun Hahm, Heejin Kim, Gyuseong Lee, Hyunji M. Park, and Jaejin Lee. 2025. Thunder-DeID: Accurate and Efficient De-identification Framework for Korean Court Judgments. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 12728–12755, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Thunder-DeID: Accurate and Efficient De-identification Framework for Korean Court Judgments (Hahm et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.682.pdf
Checklist:
 2025.findings-emnlp.682.checklist.pdf