Ranking Human and LLM Texts Using Locality Statistics

Yiyang Wang, Chen Ding, Hangfeng He


Abstract
The paper extends the Data Movement Distance (DMD) – a metric defined to measure the locality in computer memory – to text by defining a normalized version called nDMD. A key feature of nDMD is a new term designed to better characterize low-frequency tokens. By evaluating nDMD on English subset of the M4 dataset and GenAI detection shared task, the paper shows three key findings. First, nDMD is systematically higher in human-written text than in machine-generated text. Second, nDMD-based features not only outperform frequency baselines but also improve overall performance when combined. Finally, the proposed DMD normalization is more effective in distinguishing human and machine text than alternative normalization approaches.
Anthology ID:
2026.findings-eacl.283
Volume:
Findings of the Association for Computational Linguistics: EACL 2026
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5337–5348
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.283/
DOI:
Bibkey:
Cite (ACL):
Yiyang Wang, Chen Ding, and Hangfeng He. 2026. Ranking Human and LLM Texts Using Locality Statistics. In Findings of the Association for Computational Linguistics: EACL 2026, pages 5337–5348, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Ranking Human and LLM Texts Using Locality Statistics (Wang et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.283.pdf
Checklist:
 2026.findings-eacl.283.checklist.pdf