Abstract
Translating noisy inputs, such as the output of a speech recognizer, is a difficult but important challenge for neural machine translation. One way to increase robustness of neural models is by introducing artificial noise to the training data. In this paper, we experiment with appropriate forms of such noise, exploring a middle ground between general-purpose regularizers and highly task-specific forms of noise induction. We show that with a simple generative noise model, moderate gains can be achieved in translating erroneous speech transcripts, provided that type and amount of noise are properly calibrated. The optimal amount of noise at training time is much smaller than the amount of noise in our test data, indicating limitations due to trainability issues. We note that unlike our baseline model, models trained on noisy data are able to generate outputs of proper length even for noisy inputs, while gradually reducing output length for higher amount of noise, as might also be expected from a human translator. We discuss these findings in details and give suggestions for future work.- Anthology ID:
- 2017.iwslt-1.13
- Volume:
- Proceedings of the 14th International Conference on Spoken Language Translation
- Month:
- December 14-15
- Year:
- 2017
- Address:
- Tokyo, Japan
- Editors:
- Sakriani Sakti, Masao Utiyama
- Venue:
- IWSLT
- SIG:
- SIGSLT
- Publisher:
- International Workshop on Spoken Language Translation
- Note:
- Pages:
- 90–96
- Language:
- URL:
- https://aclanthology.org/2017.iwslt-1.13
- DOI:
- Cite (ACL):
- Matthias Sperber, Jan Niehues, and Alex Waibel. 2017. Toward Robust Neural Machine Translation for Noisy Input Sequences. In Proceedings of the 14th International Conference on Spoken Language Translation, pages 90–96, Tokyo, Japan. International Workshop on Spoken Language Translation.
- Cite (Informal):
- Toward Robust Neural Machine Translation for Noisy Input Sequences (Sperber et al., IWSLT 2017)
- PDF:
- https://preview.aclanthology.org/landing_page/2017.iwslt-1.13.pdf