TURING: Evaluating Human Abilities to Identify AI-Generated Texts

Natalia Kalashnikova, Nicolas De Bufala, Sophie Fayad, Laurent Cervoni


Abstract
This study analyzes humans’ ability to identify AI-generated texts across 10 genres. We collected 9164 annotations from 214 participants on 500 texts (half human, half LLM-produced), and analyzed 7943 after quality screening. Our main findings are that the humans accuracy was above chance but far from perfect (around 59%), with a slight tendency to label texts as "Human-generated". Their performance is influenced by the text genre (structural/factual formats easier to identify vs. complex genres) and by generating LLM. Annotators optionally selected three-level descriptors to justify decisions. While they had very limited effects on accuracy, their usage showed some association between text features (monotony, lack of cohesion or coherence) and "AI-generated" labeling. However, the linguistic features of the texts appear to have no robust impact after correction on human judgment. A small learning effect emerged but was practically negligible (0.1-0.2%), and personal characteristics of annotators had an impact on their accuracy, except age, which showed no effect. Finally, two automated detection tools were tested, reaching 88% accuracy on our distribution, clearly above humans, highlighting the value of human-tool combinations.
Anthology ID:
2026.lrec-main.355
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
4527–4535
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.355/
DOI:
Bibkey:
Cite (ACL):
Natalia Kalashnikova, Nicolas De Bufala, Sophie Fayad, and Laurent Cervoni. 2026. TURING: Evaluating Human Abilities to Identify AI-Generated Texts. International Conference on Language Resources and Evaluation, main:4527–4535.
Cite (Informal):
TURING: Evaluating Human Abilities to Identify AI-Generated Texts (Kalashnikova et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.355.pdf