Handwritten Character Generation using Y-Autoencoder for Character Recognition Model Training

Tomoki Kitagawa, Chee Siang Leow, Hiromitsu Nishizaki


Abstract
It is well-known that the deep learning-based optical character recognition (OCR) system needs a large amount of data to train a high-performance character recognizer. However, it is costly to collect a large amount of realistic handwritten characters. This paper introduces a Y-Autoencoder (Y-AE)-based handwritten character generator to generate multiple Japanese Hiragana characters with a single image to increase the amount of data for training a handwritten character recognizer. The adaptive instance normalization (AdaIN) layer allows the generator to be trained and generate handwritten character images without paired-character image labels. The experiment shows that the Y-AE could generate Japanese character images then used to train the handwritten character recognizer, producing an F1-score improved from 0.8664 to 0.9281. We further analyzed the usefulness of the Y-AE-based generator with shape images, out-of-character (OOC) images, which have different character images styles in model training. The result showed that the generator could generate a handwritten image with a similar style to that of the input character.
Anthology ID:
2022.lrec-1.799
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
7344–7351
Language:
URL:
https://aclanthology.org/2022.lrec-1.799
DOI:
Bibkey:
Cite (ACL):
Tomoki Kitagawa, Chee Siang Leow, and Hiromitsu Nishizaki. 2022. Handwritten Character Generation using Y-Autoencoder for Character Recognition Model Training. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 7344–7351, Marseille, France. European Language Resources Association.
Cite (Informal):
Handwritten Character Generation using Y-Autoencoder for Character Recognition Model Training (Kitagawa et al., LREC 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/2022.lrec-1.799.pdf