Thomas Bolt


2024

pdf
Leveraging Part-of-Speech Tagging for Enhanced Stylometry of Latin Literature
Sarah Chen | Patrick Burns | Thomas Bolt | Pramit Chaudhuri | Joseph Dexter
Proceedings of the 1st Workshop on Machine Learning for Ancient Languages (ML4AL 2024)

In literary critical applications, stylometry can benefit from hand-curated feature sets capturing various syntactic and rhetorical functions. For premodern languages, calculation of such features is hampered by a lack of adequate computational resources for accurate part-of-speech tagging and semantic disambiguation. This paper reports an evaluation of POS-taggers for Latin and their use in augmenting a hand-curated stylometric feature set. Our experiments show that POS-augmented features not only provide more accurate counts than POS-blind features but also perform better on tasks such as genre classification. In the course of this work we introduce POS n-grams as a feature for Latin stylometry.

2019

pdf
Stylometric Classification of Ancient Greek Literary Texts by Genre
Efthimios Gianitsos | Thomas Bolt | Pramit Chaudhuri | Joseph P. Dexter
Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

Classification of texts by genre is an important application of natural language processing to literary corpora but remains understudied for premodern and non-English traditions. We develop a stylometric feature set for ancient Greek that enables identification of texts as prose or verse. The set contains over 20 primarily syntactic features, which are calculated according to custom, language-specific heuristics. Using these features, we classify almost all surviving classical Greek literature as prose or verse with >97% accuracy and F1 score, and further classify a selection of the verse texts into the traditional genres of epic and drama.