Ryan Sandell


2024

The authorship of the Homeric poems has been a matter of debate for centuries. Computational approaches such as language modeling exist that can aid experts in making crucial headway. We observe, however, that such work has, thus far, only been carried out at the level of lengthier excerpts, but not individual verses, the level at which most suspected interpolations occur. We address this weakness by presenting a corpus of Homeric verses, each complemented with a score quantifying linguistic unexpectedness based on Perplexity. We assess the nature of these scores by exploring their correlation with named entities, the frequency of character n-grams, and (inverse) word frequency, revealing robust correlations with the latter two. This apparent bias can be partly overcome by simply dividing scores for unexpectedness by the maximum term frequency per verse.