Beyond Lemmas and Syntax: Comparing Human and LLM-Generated Scientific Abstracts

Sergei Bagdasarov; Diego Alves

Beyond Lemmas and Syntax: Comparing Human and LLM-Generated Scientific Abstracts

Abstract

In this study, we compare human-written (HWT) and machine-generated (MGT) abstracts of scientific papers, going beyond traditional lexical and syntactic analyses. We use an extensive corpus of publications on computational linguistics submitted to the Association of Computational Linguistics from mid 1950s to 2022. First, we generate abstracts with three state-of-the-art models (GPT-4o, Llama 3.1 and Qwen 2.5), providing the models with full texts of papers, and subsequently we compare these abstracts to those written by humans. We study the overall information content of abstracts, operationalised as surprisal, and the distribution of information in abstracts quantified as local Uniform Information Density (UID), both metrics related to the processing effort. Subsequently, we perform an extrinsic evaluation through topic modelling and clustering applying the BERTopic model. Our results show significant differences both in surprisal and UID, suggesting that abstracts generated by Llama are less cognitively demanding and show a more uniform distribution of information. Our topic modelling experiments show greater divergence between humans and LLMs than between LLM pairs. At the same time, Llama abstracts seem to be more semantically similar to those written by humans, standing in line with previous findings suggesting such similarity on lexical and syntactic level.

Anthology ID:: 2026.lrec-main.304
Volume:: Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:: May
Year:: 2026
Address:: Palma de Mallorca, Spain
Editors:: Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:: LREC
SIG:
Publisher:: ELRA Language Resource Association
Note:
Pages:: 3823–3832
Language:
URL:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.304/
DOI:
Bibkey:
Cite (ACL):: Sergei Bagdasarov and Diego Alves. 2026. Beyond Lemmas and Syntax: Comparing Human and LLM-Generated Scientific Abstracts. International Conference on Language Resources and Evaluation, main:3823–3832.
Cite (Informal):: Beyond Lemmas and Syntax: Comparing Human and LLM-Generated Scientific Abstracts (Bagdasarov & Alves, LREC 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.304.pdf

PDF Cite Search Fix data