A simple and effective weighted phrase extraction for machine translation adaptation

Saab Mansour, Hermann Ney


Abstract
The task of domain-adaptation attempts to exploit data mainly drawn from one domain (e.g. news) to maximize the performance on the test domain (e.g. weblogs). In previous work, weighting the training instances was used for filtering dissimilar data. We extend this by incorporating the weights directly into the standard phrase training procedure of statistical machine translation (SMT). This allows the SMT system to make the decision whether to use a phrase translation pair or not, a more methodological way than discarding phrase pairs completely when using filtering. Furthermore, we suggest a combined filtering and weighting procedure to achieve better results while reducing the phrase table size. The proposed methods are evaluated in the context of Arabicto-English translation on various conditions, where significant improvements are reported when using the suggested weighted phrase training. The weighting method also improves over filtering, and the combined filtering and weighting is better than a standalone filtering method. Finally, we experiment with mixture modeling, where additional improvements are reported when using weighted phrase extraction over a variety of baselines.
Anthology ID:
2012.iwslt-papers.7
Volume:
Proceedings of the 9th International Workshop on Spoken Language Translation: Papers
Month:
December 6-7
Year:
2012
Address:
Hong Kong, Table of contents
Venue:
IWSLT
SIG:
SIGSLT
Publisher:
Note:
Pages:
193–200
Language:
URL:
https://aclanthology.org/2012.iwslt-papers.7
DOI:
Bibkey:
Cite (ACL):
Saab Mansour and Hermann Ney. 2012. A simple and effective weighted phrase extraction for machine translation adaptation. In Proceedings of the 9th International Workshop on Spoken Language Translation: Papers, pages 193–200, Hong Kong, Table of contents.
Cite (Informal):
A simple and effective weighted phrase extraction for machine translation adaptation (Mansour & Ney, IWSLT 2012)
Copy Citation:
PDF:
https://preview.aclanthology.org/paclic-22-ingestion/2012.iwslt-papers.7.pdf