Investigating the detection of Tortured Phrases in Scientific Literature

Puthineath Lay, Martin Lentschat, Cyril Labbe


Abstract
With the help of online tools, unscrupulous authors can today generate a pseudo-scientific article and attempt to publish it. Some of these tools work by replacing or paraphrasing existing texts to produce new content, but they have a tendency to generate nonsensical expressions. A recent study introduced the concept of “tortured phrase”, an unexpected odd phrase that appears instead of the fixed expression. E.g. counterfeit consciousness instead of artificial intelligence. The present study aims at investigating how tortured phrases, that are not yet listed, can be detected automatically. We conducted several experiments, including non-neural binary classification, neural binary classification and cosine similarity comparison of the phrase tokens, yielding noticeable results.
Anthology ID:
2022.sdp-1.4
Volume:
Proceedings of the Third Workshop on Scholarly Document Processing
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
sdp
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
32–36
Language:
URL:
https://aclanthology.org/2022.sdp-1.4
DOI:
Bibkey:
Cite (ACL):
Puthineath Lay, Martin Lentschat, and Cyril Labbe. 2022. Investigating the detection of Tortured Phrases in Scientific Literature. In Proceedings of the Third Workshop on Scholarly Document Processing, pages 32–36, Gyeongju, Republic of Korea. Association for Computational Linguistics.
Cite (Informal):
Investigating the detection of Tortured Phrases in Scientific Literature (Lay et al., sdp 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.sdp-1.4.pdf