Chen Ding
2026
Ranking Human and LLM Texts Using Locality Statistics
Yiyang Wang | Chen Ding | Hangfeng He
Findings of the Association for Computational Linguistics: EACL 2026
Yiyang Wang | Chen Ding | Hangfeng He
Findings of the Association for Computational Linguistics: EACL 2026
The paper extends the Data Movement Distance (DMD) – a metric defined to measure the locality in computer memory – to text by defining a normalized version called nDMD. A key feature of nDMD is a new term designed to better characterize low-frequency tokens. By evaluating nDMD on English subset of the M4 dataset and GenAI detection shared task, the paper shows three key findings. First, nDMD is systematically higher in human-written text than in machine-generated text. Second, nDMD-based features not only outperform frequency baselines but also improve overall performance when combined. Finally, the proposed DMD normalization is more effective in distinguishing human and machine text than alternative normalization approaches.