A Comparison of Different Punctuation Prediction Approaches in a Translation Context
Vincent Vandeghinste, Lyan Verwimp, Joris Pelemans, Patrick Wambacq
Abstract
We test a series of techniques to predict punctuation and its effect on machine translation (MT) quality. Several techniques for punctuation prediction are compared: language modeling techniques, such as n-grams and long shortterm memories (LSTM), sequence labeling LSTMs (unidirectional and bidirectional), and monolingual phrase-based, hierarchical and neural MT. For actual translation, phrase-based, hierarchical and neural MT are investigated. We observe that for punctuation prediction, phrase-based statistical MT and neural MT reach similar results, and are best used as a preprocessing step which is followed by neural MT to perform the actual translation. Implicit punctuation insertion by a dedicated neural MT system, trained on unpunctuated source and punctuated target, yields similar results.- Anthology ID:
- 2018.eamt-main.27
- Volume:
- Proceedings of the 21st Annual Conference of the European Association for Machine Translation
- Month:
- May
- Year:
- 2018
- Address:
- Alicante, Spain
- Editors:
- Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez, Miquel Esplà-Gomis, Maja Popović, Celia Rico, André Martins, Joachim Van den Bogaert, Mikel L. Forcada
- Venue:
- EAMT
- SIG:
- Publisher:
- Note:
- Pages:
- 289–298
- Language:
- URL:
- https://aclanthology.org/2018.eamt-main.27
- DOI:
- Cite (ACL):
- Vincent Vandeghinste, Lyan Verwimp, Joris Pelemans, and Patrick Wambacq. 2018. A Comparison of Different Punctuation Prediction Approaches in a Translation Context. In Proceedings of the 21st Annual Conference of the European Association for Machine Translation, pages 289–298, Alicante, Spain.
- Cite (Informal):
- A Comparison of Different Punctuation Prediction Approaches in a Translation Context (Vandeghinste et al., EAMT 2018)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2018.eamt-main.27.pdf