Abstract
The lack of a spelling convention in historical documents makes their orthography to change depending on the author and the time period in which each document was written. This represents a problem for the preservation of the cultural heritage, which strives to create a digital text version of a historical document. With the aim of solving this problem, we propose three approaches—based on statistical, neural and character-based machine translation—to adapt the document’s spelling to modern standards. We tested these approaches in different scenarios, obtaining very encouraging results.- Anthology ID:
- 2018.eamt-main.13
- Volume:
- Proceedings of the 21st Annual Conference of the European Association for Machine Translation
- Month:
- May
- Year:
- 2018
- Address:
- Alicante, Spain
- Venue:
- EAMT
- SIG:
- Publisher:
- Note:
- Pages:
- 149–158
- Language:
- URL:
- https://aclanthology.org/2018.eamt-main.13
- DOI:
- Cite (ACL):
- Miguel Domingo and Francisco Casacuberta. 2018. Spelling Normalization of Historical Documents by Using a Machine Translation Approach. In Proceedings of the 21st Annual Conference of the European Association for Machine Translation, pages 149–158, Alicante, Spain.
- Cite (Informal):
- Spelling Normalization of Historical Documents by Using a Machine Translation Approach (Domingo & Casacuberta, EAMT 2018)
- PDF:
- https://preview.aclanthology.org/remove-xml-comments/2018.eamt-main.13.pdf