Abstract
We present a new approach to domain adaptation for SMT that enriches standard phrase-based models with lexicalised word and phrase pair features to help the model select appropriate translations for the target domain (TED talks). In addition, we show how source-side sentence-level topics can be incorporated to make the features differentiate between more fine-grained topics within the target domain (topic adaptation). We compare tuning our sparse features on a development set versus on the entire in-domain corpus and introduce a new method of porting them to larger mixed-domain models. Experimental results show that our features improve performance over a MIRA baseline and that in some cases we can get additional improvements with topic features. We evaluate our methods on two language pairs, English-French and German-English, showing promising results.- Anthology ID:
- 2012.iwslt-papers.17
- Volume:
- Proceedings of the 9th International Workshop on Spoken Language Translation: Papers
- Month:
- December 6-7
- Year:
- 2012
- Address:
- Hong Kong, Table of contents
- Venue:
- IWSLT
- SIG:
- SIGSLT
- Publisher:
- Note:
- Pages:
- 268–275
- Language:
- URL:
- https://aclanthology.org/2012.iwslt-papers.17
- DOI:
- Cite (ACL):
- Eva Hasler, Barry Haddow, and Philipp Koehn. 2012. Sparse lexicalised features and topic adaptation for SMT. In Proceedings of the 9th International Workshop on Spoken Language Translation: Papers, pages 268–275, Hong Kong, Table of contents.
- Cite (Informal):
- Sparse lexicalised features and topic adaptation for SMT (Hasler et al., IWSLT 2012)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2012.iwslt-papers.17.pdf