A Dataset of Historical Medical Periodicals Annotated with Textual Genre

Vera Danilova, Sara Stymne


Abstract
Historical corpora, especially those compiled from magazines and periodicals, are complex due to the diversity of text types and evolving genre conventions. Addressing these challenges requires systematic genre annotation and well-defined classification schemes to support downstream NLP tasks. This paper introduces a dataset of historical medical periodical texts in German and Swedish annotated for textual genre and additional features that may influence genre identification, such as the presence of OCR errors. We describe the development of the genre classification, annotator recruitment and training procedures, and provide an analysis of the annotator agreement.
Anthology ID:
2026.lrec-main.75
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
973–984
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.75/
DOI:
Bibkey:
Cite (ACL):
Vera Danilova and Sara Stymne. 2026. A Dataset of Historical Medical Periodicals Annotated with Textual Genre. International Conference on Language Resources and Evaluation, main:973–984.
Cite (Informal):
A Dataset of Historical Medical Periodicals Annotated with Textual Genre (Danilova & Stymne, LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.75.pdf