More Aligned, Less Diverse? Analyzing the Grammar and Lexicon of Two Generations of LLMs

Adri\'an Gude, Roi Santos-Rios, Francis Bond, Dan Flickinger, Carlos G\'omez-Rodr{\'\i}guez, Olga Zamaraeva


Abstract
This study contributes to a growing line of research in comparing LLM-generated texts with human-authored text, in this case, English news text. We focus in particular on the evaluation of syntactic properties through formal grammar frameworks. Our analysis compares two generations of LLMs in the context of two human-authored English news datasets from two different years. Employing the Head-Driven Phrase Structure Grammar (HPSG) formalism, we investigate the distributions of syntactic structures and lexical types of AI-generated texts and contrast them with the corresponding distributions in the human-authored New Your Times (NYT) articles. We use diversity metrics from ecology and information theory to quantify variation in grammatical constructions and lexical types. Our results show that, while English news text has changed little in the given time frame, newer, instruction-tuned LLMs display reduced syntactic and, especially, lexical diversity compared to older, non-instruction-tuned models. These findings point to future work in studying effects of instruction tuning, which, while enhancing coherence and adherence to prompts, may narrow the expressive range of model output.
Anthology ID:
2026.acl-long.1803
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
38900–38911
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1803/
DOI:
Bibkey:
Cite (ACL):
Adri\'an Gude, Roi Santos-Rios, Francis Bond, Dan Flickinger, Carlos G\'omez-Rodr{\'\i}guez, and Olga Zamaraeva. 2026. More Aligned, Less Diverse? Analyzing the Grammar and Lexicon of Two Generations of LLMs. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 38900–38911, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
More Aligned, Less Diverse? Analyzing the Grammar and Lexicon of Two Generations of LLMs (Gude et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1803.pdf
Checklist:
 2026.acl-long.1803.checklist.pdf