To Overfit or Not to Overfit? An Evaluation of HTR Workflow on 17Th-18Th Century French Corpus

Marine Tiger


Abstract
This paper presents the results of an evaluation of general Handwritten Text Recognition (HTR) models applied to 17th and 18th century corpus written in modern French and the fine-tuning of the models. Our aim was to transcribe a corpus from this period using existing pre-trained models and to assess their performance on such data. While these general models offer a large linguistic coverage, our results demonstrate they are often insufficiently adapted to the specific handwriting nuances and orthographic inconsistencies of early modern French. To improve the results, we fine-tuned a base model to develop a specialized version trained on our dataset. Although the model still encountered difficulties due to highly variable handwriting styles, it significantly improved transcription accuracy and reduced processing time. Following this step, we used a semi-automatic post-correction tool to address remaining errors and integrated Named Entity Recognition (NER) steps for automated TEI-XML encoding. This paper discusses the evaluation results of both the HTR and NER models, and how the overfitting allows to get better transcriptions on a specific corpus.
Anthology ID:
2026.lrec-main.78
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
1009–1016
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.78/
DOI:
Bibkey:
Cite (ACL):
Marine Tiger. 2026. To Overfit or Not to Overfit? An Evaluation of HTR Workflow on 17Th-18Th Century French Corpus. International Conference on Language Resources and Evaluation, main:1009–1016.
Cite (Informal):
To Overfit or Not to Overfit? An Evaluation of HTR Workflow on 17Th-18Th Century French Corpus (Tiger, LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.78.pdf