Phrase Table Induction Using In-Domain Monolingual Data for Domain Adaptation in Statistical Machine Translation

Benjamin Marie, Atsushi Fujita


Abstract
We present a new framework to induce an in-domain phrase table from in-domain monolingual data that can be used to adapt a general-domain statistical machine translation system to the targeted domain. Our method first compiles sets of phrases in source and target languages separately and generates candidate phrase pairs by taking the Cartesian product of the two phrase sets. It then computes inexpensive features for each candidate phrase pair and filters them using a supervised classifier in order to induce an in-domain phrase table. We experimented on the language pair English–French, both translation directions, in two domains and obtained consistently better results than a strong baseline system that uses an in-domain bilingual lexicon. We also conducted an error analysis that showed the induced phrase tables proposed useful translations, especially for words and phrases unseen in the parallel data used to train the general-domain baseline system.
Anthology ID:
Q17-1034
Volume:
Transactions of the Association for Computational Linguistics, Volume 5
Month:
Year:
2017
Address:
Cambridge, MA
Venue:
TACL
SIG:
Publisher:
MIT Press
Note:
Pages:
487–500
Language:
URL:
https://aclanthology.org/Q17-1034
DOI:
10.1162/tacl_a_00075
Bibkey:
Cite (ACL):
Benjamin Marie and Atsushi Fujita. 2017. Phrase Table Induction Using In-Domain Monolingual Data for Domain Adaptation in Statistical Machine Translation. Transactions of the Association for Computational Linguistics, 5:487–500.
Cite (Informal):
Phrase Table Induction Using In-Domain Monolingual Data for Domain Adaptation in Statistical Machine Translation (Marie & Fujita, TACL 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/Q17-1034.pdf
Data
ASPEC