Abstract
The paper focuses on investigating vocabulary usage for AI and human-generated text. We define vocabulary usage in two ways: structural differences and keyword differences. Structural differences are evaluated by converting text into Vocabulary-Managment Profiles, initially used for discourse analysis. Through VMPs, we can treat the text data as a time series, allowing an evaluation by implementing Dynamic time-warping distance measures and subsequently deriving similarity scores to provide an indication of whether the structural dynamics in AI texts resemble human texts. To analyze keywords, we use a measure that emphasizes frequency and dispersion to source ‘key’ keywords. A qualitative approach is then applied, noting thematic differences between human and AI writing.- Anthology ID:
- 2024.bea-1.22
- Volume:
- Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)
- Month:
- June
- Year:
- 2024
- Address:
- Mexico City, Mexico
- Editors:
- Ekaterina Kochmar, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anaïs Tack, Victoria Yaneva, Zheng Yuan
- Venue:
- BEA
- SIG:
- SIGEDU
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 266–282
- Language:
- URL:
- https://aclanthology.org/2024.bea-1.22
- DOI:
- Cite (ACL):
- Matthew Durward and Christopher Thomson. 2024. Evaluating Vocabulary Usage in LLMs. In Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024), pages 266–282, Mexico City, Mexico. Association for Computational Linguistics.
- Cite (Informal):
- Evaluating Vocabulary Usage in LLMs (Durward & Thomson, BEA 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2024.bea-1.22.pdf