Applications of data selection via cross-entropy difference for real-world statistical machine translation

Amittai Axelrod, QingJun Li, William D. Lewis


Abstract
We broaden the application of data selection methods for domain adaptation to a larger number of languages, data, and decoders than shown in previous work, and explore comparable applications for both monolingual and bilingual cross-entropy difference methods. We compare domain adapted systems against very large general-purpose systems for the same languages, and do so without a bias to a particular direction. We present results against real-world generalpurpose systems tuned on domain-specific data, which are substantially harder to beat than standard research baseline systems. We show better performance for nearly all domain adapted systems, despite the fact that the domainadapted systems are trained on a fraction of the content of their general domain counterparts. The high performance of these methods suggest applicability to a wide variety of contexts, particularly in scenarios where only small supplies of unambiguously domain-specific data are available, yet it is believed that additional similar data is included in larger heterogenous-content general-domain corpora.
Anthology ID:
2012.iwslt-papers.8
Volume:
Proceedings of the 9th International Workshop on Spoken Language Translation: Papers
Month:
December 6-7
Year:
2012
Address:
Hong Kong, Table of contents
Venue:
IWSLT
SIG:
SIGSLT
Publisher:
Note:
Pages:
201–208
Language:
URL:
https://aclanthology.org/2012.iwslt-papers.8
DOI:
Bibkey:
Cite (ACL):
Amittai Axelrod, QingJun Li, and William D. Lewis. 2012. Applications of data selection via cross-entropy difference for real-world statistical machine translation. In Proceedings of the 9th International Workshop on Spoken Language Translation: Papers, pages 201–208, Hong Kong, Table of contents.
Cite (Informal):
Applications of data selection via cross-entropy difference for real-world statistical machine translation (Axelrod et al., IWSLT 2012)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/2012.iwslt-papers.8.pdf