Sparse lexicalised features and topic adaptation for SMT

Eva Hasler, Barry Haddow, Philipp Koehn


Abstract
We present a new approach to domain adaptation for SMT that enriches standard phrase-based models with lexicalised word and phrase pair features to help the model select appropriate translations for the target domain (TED talks). In addition, we show how source-side sentence-level topics can be incorporated to make the features differentiate between more fine-grained topics within the target domain (topic adaptation). We compare tuning our sparse features on a development set versus on the entire in-domain corpus and introduce a new method of porting them to larger mixed-domain models. Experimental results show that our features improve performance over a MIRA baseline and that in some cases we can get additional improvements with topic features. We evaluate our methods on two language pairs, English-French and German-English, showing promising results.
Anthology ID:
2012.iwslt-papers.17
Volume:
Proceedings of the 9th International Workshop on Spoken Language Translation: Papers
Month:
December 6-7
Year:
2012
Address:
Hong Kong, Table of contents
Venue:
IWSLT
SIG:
SIGSLT
Publisher:
Note:
Pages:
268–275
Language:
URL:
https://aclanthology.org/2012.iwslt-papers.17
DOI:
Bibkey:
Cite (ACL):
Eva Hasler, Barry Haddow, and Philipp Koehn. 2012. Sparse lexicalised features and topic adaptation for SMT. In Proceedings of the 9th International Workshop on Spoken Language Translation: Papers, pages 268–275, Hong Kong, Table of contents.
Cite (Informal):
Sparse lexicalised features and topic adaptation for SMT (Hasler et al., IWSLT 2012)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2012.iwslt-papers.17.pdf