Spelling Normalization of Historical Documents by Using a Machine Translation Approach

Miguel Domingo; Francisco Casacuberta

Spelling Normalization of Historical Documents by Using a Machine Translation Approach

Abstract

The lack of a spelling convention in historical documents makes their orthography to change depending on the author and the time period in which each document was written. This represents a problem for the preservation of the cultural heritage, which strives to create a digital text version of a historical document. With the aim of solving this problem, we propose three approaches—based on statistical, neural and character-based machine translation—to adapt the document’s spelling to modern standards. We tested these approaches in different scenarios, obtaining very encouraging results.

Anthology ID:: 2018.eamt-main.13
Volume:: Proceedings of the 21st Annual Conference of the European Association for Machine Translation
Month:: May
Year:: 2018
Address:: Alicante, Spain
Venue:: EAMT
SIG:
Publisher:
Note:
Pages:: 149–158
Language:
URL:: https://aclanthology.org/2018.eamt-main.13
DOI:
Bibkey:
Cite (ACL):: Miguel Domingo and Francisco Casacuberta. 2018. Spelling Normalization of Historical Documents by Using a Machine Translation Approach. In Proceedings of the 21st Annual Conference of the European Association for Machine Translation, pages 149–158, Alicante, Spain.
Cite (Informal):: Spelling Normalization of Historical Documents by Using a Machine Translation Approach (Domingo & Casacuberta, EAMT 2018)
Copy Citation:
PDF:: https://preview.aclanthology.org/remove-xml-comments/2018.eamt-main.13.pdf

PDF Search