How Large Language Models are Transforming Machine-Paraphrase Plagiarism

Jan Philip Wahle, Terry Ruas, Frederic Kirstein, Bela Gipp


Abstract
The recent success of large language models for text generation poses a severe threat to academic integrity, as plagiarists can generate realistic paraphrases indistinguishable from original work.However, the role of large autoregressive models in generating machine-paraphrased plagiarism and their detection is still incipient in the literature.This work explores T5 and GPT3 for machine-paraphrase generation on scientific articles from arXiv, student theses, and Wikipedia.We evaluate the detection performance of six automated solutions and one commercial plagiarism detection software and perform a human study with 105 participants regarding their detection performance and the quality of generated examples.Our results suggest that large language models can rewrite text humans have difficulty identifying as machine-paraphrased (53% mean acc.).Human experts rate the quality of paraphrases generated by GPT-3 as high as original texts (clarity 4.0/5, fluency 4.2/5, coherence 3.8/5).The best-performing detection model (GPT-3) achieves 66% F1-score in detecting paraphrases.We make our code, data, and findings publicly available to facilitate the development of detection solutions.
Anthology ID:
2022.emnlp-main.62
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
952–963
Language:
URL:
https://aclanthology.org/2022.emnlp-main.62
DOI:
10.18653/v1/2022.emnlp-main.62
Bibkey:
Cite (ACL):
Jan Philip Wahle, Terry Ruas, Frederic Kirstein, and Bela Gipp. 2022. How Large Language Models are Transforming Machine-Paraphrase Plagiarism. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 952–963, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
How Large Language Models are Transforming Machine-Paraphrase Plagiarism (Wahle et al., EMNLP 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/2022.emnlp-main.62.pdf