A Quality-based Active Sample Selection Strategy for Statistical Machine Translation

Varvara Logacheva, Lucia Specia


Abstract
This paper presents a new active learning technique for machine translation based on quality estimation of automatically translated sentences. It uses an error-driven strategy, i.e., it assumes that the more errors an automatically translated sentence contains, the more informative it is for the translation system. Our approach is based on a quality estimation technique which involves a wider range of features of the source text, automatic translation, and machine translation system compared to previous work. In addition, we enhance the machine translation system training data with post-edited machine translations of the sentences selected, instead of simulating this using previously created reference translations. We found that re-training systems with additional post-edited data yields higher quality translations regardless of the selection strategy used. We relate this to the fact that post-editions tend to be closer to source sentences as compared to references, making the rule extraction process more reliable.
Anthology ID:
L14-1519
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2690–2695
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/658_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Varvara Logacheva and Lucia Specia. 2014. A Quality-based Active Sample Selection Strategy for Statistical Machine Translation. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 2690–2695, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
A Quality-based Active Sample Selection Strategy for Statistical Machine Translation (Logacheva & Specia, LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/658_Paper.pdf