Threat Scenarios and Best Practices to Detect Neural Fake News

Artidoro Pagnoni, Martin Graciarena, Yulia Tsvetkov


Abstract
In this work, we discuss different threat scenarios from neural fake news generated by state-of-the-art language models. Through our experiments, we assess the performance of generated text detection systems under these threat scenarios. For each scenario, we also identify the minimax strategy for the detector that minimizes its worst-case performance. This constitutes a set of best practices that practitioners can rely on. In our analysis, we find that detectors are prone to shortcut learning (lack of out-of-distribution generalization) and discuss approaches to mitigate this problem and improve detectors more broadly. Finally, we argue that strong detectors should be released along with new generators.
Anthology ID:
2022.coling-1.106
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Editors:
Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
1233–1249
Language:
URL:
https://aclanthology.org/2022.coling-1.106
DOI:
Bibkey:
Cite (ACL):
Artidoro Pagnoni, Martin Graciarena, and Yulia Tsvetkov. 2022. Threat Scenarios and Best Practices to Detect Neural Fake News. In Proceedings of the 29th International Conference on Computational Linguistics, pages 1233–1249, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
Threat Scenarios and Best Practices to Detect Neural Fake News (Pagnoni et al., COLING 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2022.coling-1.106.pdf
Code
 artidoro/detect-gentext
Data
LAMBADAWebText