Can GPTZero’s AI Vocabulary Distinguish Between LLM-Generated and Student-Written Essays?

Veronica Schmalz; Anaïs Tack

Can GPTZero’s AI Vocabulary Distinguish Between LLM-Generated and Student-Written Essays?

Abstract

Despite recent advances in AI detection methods, their practical application, especially in education, remains limited. Educators need functional tools pointing to AI indicators within texts, rather than merely estimating whether AI was used. GPTZero’s new AI Vocabulary feature, which highlights parts of a text likely to be AI-generated based on frequent words and phrases from LLM-generated texts, offers a potential solution. However, its effectiveness has not yet been empirically validated.In this study, we examine whether GPTZero’s AI Vocabulary can effectively distinguish between LLM-generated and student-written essays. We analyze the AI Vocabulary lists published from October 2024 to March 2025 and evaluate them on a subset of the Ghostbuster dataset, which includes student and LLM essays. We train multiple Bag-of-Words classifiers using GPTZero’s AI Vocabulary terms as features and examine their individual contributions to classification.Our findings show that simply checking for the presence, not the frequency, of specific AI terms yields the best results, particularly with ChatGPT-generated essays. However, performance drops to near-random when applied to Claude-generated essays, indicating that GPTZero’s AI Vocabulary may not generalize well to texts generated by LLMs other than ChatGPT. Additionally, all classifiers based on GPTZero’s AI Vocabulary significantly underperform compared to Bag-of-Words classifiers trained directly on the full dataset vocabulary. These findings suggest that fixed vocabularies based solely on lexical features, despite their interpretability, have limited effectiveness across different LLMs and educational writing contexts.

Anthology ID:: 2025.bea-1.71
Volume:: Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Ekaterina Kochmar, Bashar Alhafni, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anaïs Tack, Victoria Yaneva, Zheng Yuan
Venues:: BEA | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 937–952
Language:
URL:: https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bea-1.71/
DOI:
Bibkey:
Cite (ACL):: Veronica Schmalz and Anaïs Tack. 2025. Can GPTZero’s AI Vocabulary Distinguish Between LLM-Generated and Student-Written Essays?. In Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025), pages 937–952, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Can GPTZero’s AI Vocabulary Distinguish Between LLM-Generated and Student-Written Essays? (Schmalz & Tack, BEA 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bea-1.71.pdf

PDF Cite Search Fix data