Nicolás Serrano
Also published as: Nicolas Serrano
2011
Handwritten Text Recognition for Historical Documents
Verónica Romero
|
Nicolás Serrano
|
Alejandro H. Toselli
|
Joan Andreu Sánchez
|
Enrique Vidal
Proceedings of the Workshop on Language Technologies for Digital Humanities and Cultural Heritage
2010
The RODRIGO Database
Nicolas Serrano
|
Francisco Castro
|
Alfons Juan
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Annotation of digitized pages from historical document collections is very important to research on automatic extraction of text blocks, lines, and handwriting recognition. We have recently introduced a new handwritten text database, GERMANA, which is based on a Spanish manuscript from 1891. To our knowledge, GERMANA is the first publicly available database mostly written in Spanish and comparable in size to standard databases. In this paper, we present another handwritten text database, RODRIGO, completely written in Spanish and comparable in size to GERMANA. However, RODRIGO comes from a much older manuscript, from 1545, where the typical difficult characteristics of historical documents are more evident. In particular, the writing style, which has clear Gothic influences, is significantly more complex than that of GERMANA. We also provide baseline results of handwriting recognition for reference in future studies, using standard techniques and tools for preprocessing, feature extraction, HMM-based image modelling, and language modelling.
Search
Co-authors
- Francisco Castro 1
- Alfons Juan 1
- Verónica Romero 1
- Alejandro H. Toselli 1
- Joan-Andreu Sánchez 1
- show all...