TURINGBENCH: A Benchmark Environment for Turing Test in the Age of Neural Text Generation

Adaku Uchendu, Zeyu Ma, Thai Le, Rui Zhang, Dongwon Lee


Abstract
Recent progress in generative language models has enabled machines to generate astonishingly realistic texts. While there are many legitimate applications of such models, there is also a rising need to distinguish machine-generated texts from human-written ones (e.g., fake news detection). However, to our best knowledge, there is currently no benchmark environment with datasets and tasks to systematically study the so-called ”Turing Test” problem for neural text generation methods. In this work, we present the TURINGBENCH benchmark environment, which is comprised of (1) a dataset with 200K human- or machine-generated samples across 20 labels Human, GPT-1, GPT-2_small, GPT-2_medium, GPT-2_large,GPT-2_xl, GPT-2_PyTorch, GPT-3, GROVER_base, GROVER_large, GROVER_mega, CTRL, XLM, XLNET_base, XLNET_large, FAIR_wmt19, FAIR_wmt20, TRANSFORMER_XL, PPLM_distil, PPLM_gpt2, (2) two benchmark tasks–i.e., Turing Test (TT) and Authorship Attribution (AA), and (3) a website with leaderboards. Our preliminary experimental results using TURINGBENCH show that GPT-3 and FAIR_wmt20 are the current winners, among all language models tested, in generating the most human-like indistinguishable texts with the lowest F1 score by five state-of-the-art TT detection models. The TURINGBENCH is available at: https://turingbench.ist.psu.edu/
Anthology ID:
2021.findings-emnlp.172
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
2001–2016
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.172
DOI:
10.18653/v1/2021.findings-emnlp.172
Bibkey:
Cite (ACL):
Adaku Uchendu, Zeyu Ma, Thai Le, Rui Zhang, and Dongwon Lee. 2021. TURINGBENCH: A Benchmark Environment for Turing Test in the Age of Neural Text Generation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2001–2016, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
TURINGBENCH: A Benchmark Environment for Turing Test in the Age of Neural Text Generation (Uchendu et al., Findings 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl24-info/2021.findings-emnlp.172.pdf
Video:
 https://preview.aclanthology.org/naacl24-info/2021.findings-emnlp.172.mp4
Code
 additional community code