TURING: Evaluating Human Abilities to Identify AI-Generated Texts

Natalia Kalashnikova; Nicolas De Bufala; Sophie Fayad; Laurent Cervoni

TURING: Evaluating Human Abilities to Identify AI-Generated Texts

Natalia Kalashnikova, Nicolas De Bufala, Sophie Fayad, Laurent Cervoni

Abstract

This study analyzes humans’ ability to identify AI-generated texts across 10 genres. We collected 9164 annotations from 214 participants on 500 texts (half human, half LLM-produced), and analyzed 7943 after quality screening. Our main findings are that the humans accuracy was above chance but far from perfect (around 59%), with a slight tendency to label texts as "Human-generated". Their performance is influenced by the text genre (structural/factual formats easier to identify vs. complex genres) and by generating LLM. Annotators optionally selected three-level descriptors to justify decisions. While they had very limited effects on accuracy, their usage showed some association between text features (monotony, lack of cohesion or coherence) and "AI-generated" labeling. However, the linguistic features of the texts appear to have no robust impact after correction on human judgment. A small learning effect emerged but was practically negligible (0.1-0.2%), and personal characteristics of annotators had an impact on their accuracy, except age, which showed no effect. Finally, two automated detection tools were tested, reaching 88% accuracy on our distribution, clearly above humans, highlighting the value of human-tool combinations.

Anthology ID:: 2026.lrec-main.355
Volume:: Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:: May
Year:: 2026
Address:: Palma de Mallorca, Spain
Editors:: Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:: LREC
SIG:
Publisher:: ELRA Language Resource Association
Note:
Pages:: 4527–4535
Language:
URL:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.355/
DOI:
Bibkey:
Cite (ACL):: Natalia Kalashnikova, Nicolas De Bufala, Sophie Fayad, and Laurent Cervoni. 2026. TURING: Evaluating Human Abilities to Identify AI-Generated Texts. International Conference on Language Resources and Evaluation, main:4527–4535.
Cite (Informal):: TURING: Evaluating Human Abilities to Identify AI-Generated Texts (Kalashnikova et al., LREC 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.355.pdf

PDF Cite Search Fix data