CoCo: Coherence-Enhanced Machine-Generated Text Detection Under Low Resource With Contrastive Learning
Xiaoming Liu, Zhaohan Zhang, Yichen Wang, Hang Pu, Yu Lan, Chao Shen
Abstract
Machine-Generated Text (MGT) detection, a task that discriminates MGT from Human-Written Text (HWT), plays a crucial role in preventing misuse of text generative models, which excel in mimicking human writing style recently. Latest proposed detectors usually take coarse text sequences as input and fine-tune pretrained models with standard cross-entropy loss. However, these methods fail to consider the linguistic structure of texts. Moreover, they lack the ability to handle the low-resource problem which could often happen in practice considering the enormous amount of textual data online. In this paper, we present a coherence-based contrastive learning model named CoCo to detect the possible MGT under low-resource scenario. To exploit the linguistic feature, we encode coherence information in form of graph into text representation. To tackle the challenges of low data resource, we employ a contrastive learning framework and propose an improved contrastive loss for preventing performance degradation brought by simple samples. The experiment results on two public datasets and two self-constructed datasets prove our approach outperforms the state-of-art methods significantly. Also, we surprisingly find that MGTs originated from up-to-date language models could be easier to detect than these from previous models, in our experiments. And we propose some preliminary explanations for this counter-intuitive phenomena. All the codes and datasets are open-sourced.- Anthology ID:
- 2023.emnlp-main.1005
- Volume:
- Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Houda Bouamor, Juan Pino, Kalika Bali
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 16167–16188
- Language:
- URL:
- https://preview.aclanthology.org/add_missing_videos/2023.emnlp-main.1005/
- DOI:
- 10.18653/v1/2023.emnlp-main.1005
- Cite (ACL):
- Xiaoming Liu, Zhaohan Zhang, Yichen Wang, Hang Pu, Yu Lan, and Chao Shen. 2023. CoCo: Coherence-Enhanced Machine-Generated Text Detection Under Low Resource With Contrastive Learning. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 16167–16188, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- CoCo: Coherence-Enhanced Machine-Generated Text Detection Under Low Resource With Contrastive Learning (Liu et al., EMNLP 2023)
- PDF:
- https://preview.aclanthology.org/add_missing_videos/2023.emnlp-main.1005.pdf