Christopher Thomson

2024

pdf abs
Evaluating Vocabulary Usage in LLMs
Matthew Durward | Christopher Thomson
Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)

The paper focuses on investigating vocabulary usage for AI and human-generated text. We define vocabulary usage in two ways: structural differences and keyword differences. Structural differences are evaluated by converting text into Vocabulary-Managment Profiles, initially used for discourse analysis. Through VMPs, we can treat the text data as a time series, allowing an evaluation by implementing Dynamic time-warping distance measures and subsequently deriving similarity scores to provide an indication of whether the structural dynamics in AI texts resemble human texts. To analyze keywords, we use a measure that emphasizes frequency and dispersion to source ‘key’ keywords. A qualitative approach is then applied, noting thematic differences between human and AI writing.

Co-authors

Matthew Durward 1

Venues

bea1