Paraphrasing Attack Resilience of Various Machine-Generated Text Detection Methods

Andrii Shportko; Inessa Verbitsky

Paraphrasing Attack Resilience of Various Machine-Generated Text Detection Methods

Abstract

The recent large-scale emergence of LLMs has left an open space for dealing with their consequences, such as plagiarism or the spread of false information on the Internet. Coupling this with the rise of AI detector bypassing tools, reliable machine-generated text detection is in increasingly high demand. We investigate the paraphrasing attack resilience of various machine-generated text detection methods, evaluating three approaches: fine-tuned RoBERTa, Binoculars, and text feature analysis, along with their ensembles using Random Forest classifiers. We discovered that Binoculars-inclusive ensembles yield the strongest results, but they also suffer the most significant losses during attacks. In this paper, we present the dichotomy of performance versus resilience in the world of AI text detection, which complicates the current perception of reliability among state-of-the-art techniques.

Anthology ID:: 2025.naacl-srw.46
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)
Month:: April
Year:: 2025
Address:: Albuquerque, USA
Editors:: Abteen Ebrahimi, Samar Haider, Emmy Liu, Sammar Haider, Maria Leonor Pacheco, Shira Wein
Venues:: NAACL | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 474–484
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2025.naacl-srw.46/
DOI:
Bibkey:
Cite (ACL):: Andrii Shportko and Inessa Verbitsky. 2025. Paraphrasing Attack Resilience of Various Machine-Generated Text Detection Methods. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop), pages 474–484, Albuquerque, USA. Association for Computational Linguistics.
Cite (Informal):: Paraphrasing Attack Resilience of Various Machine-Generated Text Detection Methods (Shportko & Verbitsky, NAACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2025.naacl-srw.46.pdf

PDF Cite Search Fix data