Active Learning for Interactive Neural Machine Translation of Data Streams

Álvaro Peris, Francisco Casacuberta


Abstract
We study the application of active learning techniques to the translation of unbounded data streams via interactive neural machine translation. The main idea is to select, from an unbounded stream of source sentences, those worth to be supervised by a human agent. The user will interactively translate those samples. Once validated, these data is useful for adapting the neural machine translation model. We propose two novel methods for selecting the samples to be validated. We exploit the information from the attention mechanism of a neural machine translation system. Our experiments show that the inclusion of active learning techniques into this pipeline allows to reduce the effort required during the process, while increasing the quality of the translation system. Moreover, it enables to balance the human effort required for achieving a certain translation quality. Moreover, our neural system outperforms classical approaches by a large margin.
Anthology ID:
K18-1015
Volume:
Proceedings of the 22nd Conference on Computational Natural Language Learning
Month:
October
Year:
2018
Address:
Brussels, Belgium
Venue:
CoNLL
SIG:
SIGNLL
Publisher:
Association for Computational Linguistics
Note:
Pages:
151–160
Language:
URL:
https://aclanthology.org/K18-1015
DOI:
10.18653/v1/K18-1015
Bibkey:
Cite (ACL):
Álvaro Peris and Francisco Casacuberta. 2018. Active Learning for Interactive Neural Machine Translation of Data Streams. In Proceedings of the 22nd Conference on Computational Natural Language Learning, pages 151–160, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Active Learning for Interactive Neural Machine Translation of Data Streams (Peris & Casacuberta, CoNLL 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/K18-1015.pdf
Code
 lvapeab/nmt-keras