Adapting to All Domains at Once: Rewarding Domain Invariance in SMT

Hoang Cuong, Khalil Sima’an, Ivan Titov


Abstract
Existing work on domain adaptation for statistical machine translation has consistently assumed access to a small sample from the test distribution (target domain) at training time. In practice, however, the target domain may not be known at training time or it may change to match user needs. In such situations, it is natural to push the system to make safer choices, giving higher preference to domain-invariant translations, which work well across domains, over risky domain-specific alternatives. We encode this intuition by (1) inducing latent subdomains from the training data only; (2) introducing features which measure how specialized phrases are to individual induced sub-domains; (3) estimating feature weights on out-of-domain data (rather than on the target domain). We conduct experiments on three language pairs and a number of different domains. We observe consistent improvements over a baseline which does not explicitly reward domain invariance.
Anthology ID:
Q16-1008
Volume:
Transactions of the Association for Computational Linguistics, Volume 4
Month:
Year:
2016
Address:
Cambridge, MA
Editors:
Lillian Lee, Mark Johnson, Kristina Toutanova
Venue:
TACL
SIG:
Publisher:
MIT Press
Note:
Pages:
99–112
Language:
URL:
https://aclanthology.org/Q16-1008
DOI:
10.1162/tacl_a_00086
Bibkey:
Cite (ACL):
Hoang Cuong, Khalil Sima’an, and Ivan Titov. 2016. Adapting to All Domains at Once: Rewarding Domain Invariance in SMT. Transactions of the Association for Computational Linguistics, 4:99–112.
Cite (Informal):
Adapting to All Domains at Once: Rewarding Domain Invariance in SMT (Cuong et al., TACL 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-bitext-workshop/Q16-1008.pdf
Code
 hoangcuong2011/UDIT
Data
Europarl