Conditional Significance Pruning: Discarding More of Huge Phrase Tables

Howard Johnson


Abstract
The technique of pruning phrase tables that are used for statistical machine translation (SMT) can achieve substantial reductions in bulk and improve translation quality, especially for very large corpora such at the Giga-FrEn. This can be further improved by conditioning each significance test on other phrase pair co-occurrence counts resulting in an additional reduction in size and increase in BLEU score. A series of experiments using Moses and the WMT11 corpora for French to English have been performed to quantify the improvement. By adhering strictly to the recommendations for the WMT11 baseline system, a strong reproducible research baseline was employed.
Anthology ID:
2012.amta-papers.28
Volume:
Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers
Month:
October 28-November 1
Year:
2012
Address:
San Diego, California, USA
Venue:
AMTA
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
Language:
URL:
https://aclanthology.org/2012.amta-papers.28
DOI:
Bibkey:
Cite (ACL):
Howard Johnson. 2012. Conditional Significance Pruning: Discarding More of Huge Phrase Tables. In Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers, San Diego, California, USA. Association for Machine Translation in the Americas.
Cite (Informal):
Conditional Significance Pruning: Discarding More of Huge Phrase Tables (Johnson, AMTA 2012)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/2012.amta-papers.28.pdf