Stylometric Classification of Ancient Greek Literary Texts by Genre
Efthimios Gianitsos, Thomas Bolt, Pramit Chaudhuri, Joseph P. Dexter
Abstract
Classification of texts by genre is an important application of natural language processing to literary corpora but remains understudied for premodern and non-English traditions. We develop a stylometric feature set for ancient Greek that enables identification of texts as prose or verse. The set contains over 20 primarily syntactic features, which are calculated according to custom, language-specific heuristics. Using these features, we classify almost all surviving classical Greek literature as prose or verse with >97% accuracy and F1 score, and further classify a selection of the verse texts into the traditional genres of epic and drama.- Anthology ID:
- W19-2507
- Volume:
- Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
- Month:
- June
- Year:
- 2019
- Address:
- Minneapolis, USA
- Editors:
- Beatrice Alex, Stefania Degaetano-Ortlieb, Anna Kazantseva, Nils Reiter, Stan Szpakowicz
- Venue:
- LaTeCH
- SIG:
- SIGHUM
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 52–60
- Language:
- URL:
- https://aclanthology.org/W19-2507
- DOI:
- 10.18653/v1/W19-2507
- Cite (ACL):
- Efthimios Gianitsos, Thomas Bolt, Pramit Chaudhuri, and Joseph P. Dexter. 2019. Stylometric Classification of Ancient Greek Literary Texts by Genre. In Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pages 52–60, Minneapolis, USA. Association for Computational Linguistics.
- Cite (Informal):
- Stylometric Classification of Ancient Greek Literary Texts by Genre (Gianitsos et al., LaTeCH 2019)
- PDF:
- https://preview.aclanthology.org/ml4al-ingestion/W19-2507.pdf