Unsupervised Detection of LLM-Generated Text in Korean Using Syntactic and Semantic Cues

Heejeong Jeon; Minsu Park; YunSeok Choi; Eunil Park

Unsupervised Detection of LLM-Generated Text in Korean Using Syntactic and Semantic Cues

Heejeong Jeon, MinSu Park, YunSeok Choi, Eunil Park

Abstract

As Large Language Models (LLMs) are increasingly used for content creation, detecting AI-generated text has become a critical challenge. Prior work has largely focused on English, leaving low-resource languages such as Korean underexplored. We propose an unsupervised detection framework that integrates two complementary signals: syntactic token cohesiveness (TOCSIN) and semantic regeneration similarity (SimLLM). To support evaluation, we construct a Korean pairwise dataset of 1,000 anchors with continuation- and regeneration-style generations and further assess performance across domains (news, research paper abstracts, essays) and model families (GPT-3.5 Turbo, GPT-4o, HyperCLOVA X, LLaMA-3-8B). Without any training, our ensemble achieves up to 0.963 F1 and 0.985 ROC-AUC, outperforming baselines. These results demonstrate that the combination of syntactic and semantic cues enables robust unsupervised detection in low-resource settings. Code available at https://github.com/dxlabskku/llm-detection-main.

Anthology ID:: 2026.findings-eacl.77
Volume:: Findings of the Association for Computational Linguistics: EACL 2026
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1504–1518
Language:
URL:: https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.77/
DOI:
Bibkey:
Cite (ACL):: Heejeong Jeon, MinSu Park, YunSeok Choi, and Eunil Park. 2026. Unsupervised Detection of LLM-Generated Text in Korean Using Syntactic and Semantic Cues. In Findings of the Association for Computational Linguistics: EACL 2026, pages 1504–1518, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Unsupervised Detection of LLM-Generated Text in Korean Using Syntactic and Semantic Cues (Jeon et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.77.pdf
Checklist:: 2026.findings-eacl.77.checklist.pdf

PDF Cite Search Checklist Fix data