How Large Language Models are Transforming Machine-Paraphrase Plagiarism

Jan Philip Wahle, Terry Ruas, Frederic Kirstein, Bela Gipp


Abstract
The recent success of large language models for text generation poses a severe threat to academic integrity, as plagiarists can generate realistic paraphrases indistinguishable from original work. However, the role of large autoregressive models in generating machine-paraphrased plagiarism and their detection is still incipient in the literature. This work explores T5 and GPT3 for machine-paraphrase generation on scientific articles from arXiv, student theses, and Wikipedia. We evaluate the detection performance of six automated solutions and one commercial plagiarism detection software and perform a human study with 105 participants regarding their detection performance and the quality of generated examples. Our results suggest that large language models can rewrite text humans have difficulty identifying as machine-paraphrased (53% mean acc.). Human experts rate the quality of paraphrases generated by GPT-3 as high as original texts (clarity 4.0/5, fluency 4.2/5, coherence 3.8/5). The best-performing detection model (GPT-3) achieves 66% F1-score in detecting paraphrases. We make our code, data, and findings publicly available to facilitate the development of detection solutions.
Anthology ID:
2022.emnlp-main.62
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
952–963
Language:
URL:
https://aclanthology.org/2022.emnlp-main.62
DOI:
10.18653/v1/2022.emnlp-main.62
Bibkey:
Cite (ACL):
Jan Philip Wahle, Terry Ruas, Frederic Kirstein, and Bela Gipp. 2022. How Large Language Models are Transforming Machine-Paraphrase Plagiarism. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 952–963, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
How Large Language Models are Transforming Machine-Paraphrase Plagiarism (Wahle et al., EMNLP 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-1/2022.emnlp-main.62.pdf