Abstract
The increasing capability of large language models (LLMs) to generate fluent long-form texts is presenting new challenges in distinguishing these outputs from those of humans. Existing zero-shot detectors that primarily focus on token-level distributions are vulnerable to real-world domain shift including different decoding strategies, variations in prompts, and attacks. We propose a more robust method that incorporates abstract elements—such as topic or event transitions—as key deciding factors, by training a latent-space model on sequences of events or topics derived from human-written texts. On three different domains, machine generations which are originally inseparable from humans’ on the token level can be better distinguished with our latent-space model, leading to a 31% improvement over strong baselines such as DetectGPT. Our analysis further reveals that unlike humans, modern LLMs such as GPT-4 selecting event triggers and transitions differently, and inherent disparity regardless of the generation configurations adopted in real-time.- Anthology ID:
- 2024.findings-emnlp.608
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2024
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 10394–10408
- Language:
- URL:
- https://aclanthology.org/2024.findings-emnlp.608
- DOI:
- 10.18653/v1/2024.findings-emnlp.608
- Cite (ACL):
- Yufei Tian, Zeyu Pan, and Nanyun Peng. 2024. Detecting Machine-Generated Long-Form Content with Latent-Space Variables. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 10394–10408, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- Detecting Machine-Generated Long-Form Content with Latent-Space Variables (Tian et al., Findings 2024)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2024.findings-emnlp.608.pdf