Detecting Subtle Differences between Human and Model Languages Using Spectrum of Relative Likelihood
Abstract
Human and model-generated texts can be distinguished by examining the magnitude of likelihood in language. However, it is becoming increasingly difficult as language model’s capabilities of generating human-like texts keep evolving. This study provides a new perspective by using the relative likelihood values instead of absolute ones, and extracting useful features from the spectrum-view of likelihood for the human-model text detection task. We propose a detection procedure with two classification methods, supervised and heuristic-based, respectively, which results in competitive performances with previous zero-shot detection methods and a new state-of-the-art on short-text detection. Our method can also reveal subtle differences between human and model languages, which find theoretical roots in psycholinguistics studies.- Anthology ID:
- 2024.emnlp-main.564
- Volume:
- Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 10108–10121
- Language:
- URL:
- https://preview.aclanthology.org/add_missing_videos/2024.emnlp-main.564/
- DOI:
- 10.18653/v1/2024.emnlp-main.564
- Cite (ACL):
- Yang Xu, Yu Wang, Hao An, Zhichen Liu, and Yongyuan Li. 2024. Detecting Subtle Differences between Human and Model Languages Using Spectrum of Relative Likelihood. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 10108–10121, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- Detecting Subtle Differences between Human and Model Languages Using Spectrum of Relative Likelihood (Xu et al., EMNLP 2024)
- PDF:
- https://preview.aclanthology.org/add_missing_videos/2024.emnlp-main.564.pdf