Leveraging Part-of-Speech Tagging for Enhanced Stylometry of Latin Literature
Sarah Li Chen, Patrick J. Burns, Thomas J. Bolt, Pramit Chaudhuri, Joseph P. Dexter
Abstract
In literary critical applications, stylometry can benefit from hand-curated feature sets capturing various syntactic and rhetorical functions. For premodern languages, calculation of such features is hampered by a lack of adequate computational resources for accurate part-of-speech tagging and semantic disambiguation. This paper reports an evaluation of POS-taggers for Latin and their use in augmenting a hand-curated stylometric feature set. Our experiments show that POS-augmented features not only provide more accurate counts than POS-blind features but also perform better on tasks such as genre classification. In the course of this work we introduce POS n-grams as a feature for Latin stylometry.- Anthology ID:
- 2024.ml4al-1.24
- Volume:
- Proceedings of the 1st Workshop on Machine Learning for Ancient Languages (ML4AL 2024)
- Month:
- August
- Year:
- 2024
- Address:
- Hybrid in Bangkok, Thailand and online
- Editors:
- John Pavlopoulos, Thea Sommerschield, Yannis Assael, Shai Gordin, Kyunghyun Cho, Marco Passarotti, Rachele Sprugnoli, Yudong Liu, Bin Li, Adam Anderson
- Venues:
- ML4AL | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 251–259
- Language:
- URL:
- https://aclanthology.org/2024.ml4al-1.24
- DOI:
- 10.18653/v1/2024.ml4al-1.24
- Cite (ACL):
- Sarah Li Chen, Patrick J. Burns, Thomas J. Bolt, Pramit Chaudhuri, and Joseph P. Dexter. 2024. Leveraging Part-of-Speech Tagging for Enhanced Stylometry of Latin Literature. In Proceedings of the 1st Workshop on Machine Learning for Ancient Languages (ML4AL 2024), pages 251–259, Hybrid in Bangkok, Thailand and online. Association for Computational Linguistics.
- Cite (Informal):
- Leveraging Part-of-Speech Tagging for Enhanced Stylometry of Latin Literature (Chen et al., ML4AL-WS 2024)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2024.ml4al-1.24.pdf