Transfer Learning for a Letter-Ngrams to Word Decoder in the Context of Historical Handwriting Recognition with Scarce Resources
Adeline Granet, Emmanuel Morin, Harold Mouchère, Solen Quiniou, Christian Viard-Gaudin
Abstract
Lack of data can be an issue when beginning a new study on historical handwritten documents. In order to deal with this, we present the character-based decoder part of a multilingual approach based on transductive transfer learning for a historical handwriting recognition task on Italian Comedy Registers. The decoder must build a sequence of characters that corresponds to a word from a vector of letter-ngrams. As learning data, we created a new dataset from untapped resources that covers the same domain and period of our Italian Comedy data, as well as resources from common domains, periods, or languages. We obtain a 97.42% Character Recognition Rate and a 86.57% Word Recognition Rate on our Italian Comedy data, despite a lexical coverage of 67% between the Italian Comedy data and the training data. These results show that an efficient system can be obtained by a carefully selecting the datasets used for the transfer learning.- Anthology ID:
- C18-1125
- Volume:
- Proceedings of the 27th International Conference on Computational Linguistics
- Month:
- August
- Year:
- 2018
- Address:
- Santa Fe, New Mexico, USA
- Editors:
- Emily M. Bender, Leon Derczynski, Pierre Isabelle
- Venue:
- COLING
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1474–1484
- Language:
- URL:
- https://aclanthology.org/C18-1125
- DOI:
- Cite (ACL):
- Adeline Granet, Emmanuel Morin, Harold Mouchère, Solen Quiniou, and Christian Viard-Gaudin. 2018. Transfer Learning for a Letter-Ngrams to Word Decoder in the Context of Historical Handwriting Recognition with Scarce Resources. In Proceedings of the 27th International Conference on Computational Linguistics, pages 1474–1484, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
- Cite (Informal):
- Transfer Learning for a Letter-Ngrams to Word Decoder in the Context of Historical Handwriting Recognition with Scarce Resources (Granet et al., COLING 2018)
- PDF:
- https://preview.aclanthology.org/naacl24-info/C18-1125.pdf
- Data
- George Washington