TURING: Evaluating Human Abilities to Identify AI-Generated Texts
Natalia Kalashnikova, Nicolas De Bufala, Sophie Fayad, Laurent Cervoni
Abstract
This study analyzes humans’ ability to identify AI-generated texts across 10 genres. We collected 9164 annotations from 214 participants on 500 texts (half human, half LLM-produced), and analyzed 7943 after quality screening. Our main findings are that the humans accuracy was above chance but far from perfect (around 59%), with a slight tendency to label texts as "Human-generated". Their performance is influenced by the text genre (structural/factual formats easier to identify vs. complex genres) and by generating LLM. Annotators optionally selected three-level descriptors to justify decisions. While they had very limited effects on accuracy, their usage showed some association between text features (monotony, lack of cohesion or coherence) and "AI-generated" labeling. However, the linguistic features of the texts appear to have no robust impact after correction on human judgment. A small learning effect emerged but was practically negligible (0.1-0.2%), and personal characteristics of annotators had an impact on their accuracy, except age, which showed no effect. Finally, two automated detection tools were tested, reaching 88% accuracy on our distribution, clearly above humans, highlighting the value of human-tool combinations.- Anthology ID:
- 2026.lrec-main.355
- Volume:
- Proceedings of the Fifteenth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2026
- Address:
- Palma de Mallorca, Spain
- Editors:
- Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
- Venue:
- LREC
- SIG:
- Publisher:
- ELRA Language Resource Association
- Note:
- Pages:
- 4527–4535
- Language:
- URL:
- https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.355/
- DOI:
- Cite (ACL):
- Natalia Kalashnikova, Nicolas De Bufala, Sophie Fayad, and Laurent Cervoni. 2026. TURING: Evaluating Human Abilities to Identify AI-Generated Texts. International Conference on Language Resources and Evaluation, main:4527–4535.
- Cite (Informal):
- TURING: Evaluating Human Abilities to Identify AI-Generated Texts (Kalashnikova et al., LREC 2026)
- PDF:
- https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.355.pdf