Leveraging Human and Machine Preferences for Zero-shot Detection of AI-Generated Text

Lei Jiang, Desheng Wu, Xiaolong Zheng, Cuicui Luo


Abstract
In recent years, the rapid advancement of large language models (LLMs) has enabled generated texts to closely mimic human writing, posing significant challenges to the detection of AI-generated content. Current mainstream zero-shot detection methods largely adopt a machine-centric perspective, relying on proxy models to compute token-level AI-likelihood scores and treating all tokens equally during overall detection. However, such approaches overlook the prediction discrepancies that arise when humans and large language models interpret the same text. We argue that tokens exhibiting greater divergence between human and machine predictions can provide stronger clues for determining the authorship of a text. To address this limitation, we propose HAPDA—a human-machine prediction discrepancy adapter for AI-generated text detection (AGTD). The framework consists of two core components: (1) a joint fine-tuning strategy for training paired human-preference and machine-preference models, and (2) a discrepancy-aware reweighting mechanism designed to calibrate token-level detection scores in downstream detectors. Extensive experiments demonstrate that HAPDA consistently and significantly enhances the detection performance of five representative baseline models under various evaluation scenarios.
Anthology ID:
2026.findings-acl.671
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
13732–13750
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.671/
DOI:
Bibkey:
Cite (ACL):
Lei Jiang, Desheng Wu, Xiaolong Zheng, and Cuicui Luo. 2026. Leveraging Human and Machine Preferences for Zero-shot Detection of AI-Generated Text. In Findings of the Association for Computational Linguistics: ACL 2026, pages 13732–13750, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Leveraging Human and Machine Preferences for Zero-shot Detection of AI-Generated Text (Jiang et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.671.pdf
Checklist:
 2026.findings-acl.671.checklist.pdf