Abstract
This paper presents a new active learning technique for machine translation based on quality estimation of automatically translated sentences. It uses an error-driven strategy, i.e., it assumes that the more errors an automatically translated sentence contains, the more informative it is for the translation system. Our approach is based on a quality estimation technique which involves a wider range of features of the source text, automatic translation, and machine translation system compared to previous work. In addition, we enhance the machine translation system training data with post-edited machine translations of the sentences selected, instead of simulating this using previously created reference translations. We found that re-training systems with additional post-edited data yields higher quality translations regardless of the selection strategy used. We relate this to the fact that post-editions tend to be closer to source sentences as compared to references, making the rule extraction process more reliable.- Anthology ID:
- L14-1519
- Volume:
- Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
- Month:
- May
- Year:
- 2014
- Address:
- Reykjavik, Iceland
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 2690–2695
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2014/pdf/658_Paper.pdf
- DOI:
- Cite (ACL):
- Varvara Logacheva and Lucia Specia. 2014. A Quality-based Active Sample Selection Strategy for Statistical Machine Translation. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 2690–2695, Reykjavik, Iceland. European Language Resources Association (ELRA).
- Cite (Informal):
- A Quality-based Active Sample Selection Strategy for Statistical Machine Translation (Logacheva & Specia, LREC 2014)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2014/pdf/658_Paper.pdf