Echoes of the Troubadours: A Corpus of Troubadour Poetry for Stylometric Analysis and Authorship Attribution

Loic De Langhe, Orphee De Clercq, Veronique Hoste


Abstract
We present TrobaCor, a curated corpus of medieval troubadour poetry, which comprises 1668 unique Old Occitan texts by a large variety of authors. Clustering and stylometric experiments show that we can accurately model authorial style beyond topical content, even though formulaic or topically diverse genres remain challenging. Furthermore, we can model and detect traces of an author’s stylistic "DNA" even in short-form collaborative poetry, offering a uniquely fine-grained perspective in the field. In addition, we provide self-organizing map visualizations in order to provide an interpretable view of stylistic patterns across authors. TrobaCor is publicly released to support reproducible research in NLP and digital humanities on this low-resource historical corpus.
Anthology ID:
2026.lrec-main.69
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
905–918
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.69/
DOI:
Bibkey:
Cite (ACL):
Loic De Langhe, Orphee De Clercq, and Veronique Hoste. 2026. Echoes of the Troubadours: A Corpus of Troubadour Poetry for Stylometric Analysis and Authorship Attribution. International Conference on Language Resources and Evaluation, main:905–918.
Cite (Informal):
Echoes of the Troubadours: A Corpus of Troubadour Poetry for Stylometric Analysis and Authorship Attribution (De Langhe et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.69.pdf