Abstract
The technique of pruning phrase tables that are used for statistical machine translation (SMT) can achieve substantial reductions in bulk and improve translation quality, especially for very large corpora such at the Giga-FrEn. This can be further improved by conditioning each significance test on other phrase pair co-occurrence counts resulting in an additional reduction in size and increase in BLEU score. A series of experiments using Moses and the WMT11 corpora for French to English have been performed to quantify the improvement. By adhering strictly to the recommendations for the WMT11 baseline system, a strong reproducible research baseline was employed.- Anthology ID:
- 2012.amta-papers.28
- Volume:
- Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers
- Month:
- October 28-November 1
- Year:
- 2012
- Address:
- San Diego, California, USA
- Venue:
- AMTA
- SIG:
- Publisher:
- Association for Machine Translation in the Americas
- Note:
- Pages:
- Language:
- URL:
- https://aclanthology.org/2012.amta-papers.28
- DOI:
- Cite (ACL):
- Howard Johnson. 2012. Conditional Significance Pruning: Discarding More of Huge Phrase Tables. In Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers, San Diego, California, USA. Association for Machine Translation in the Americas.
- Cite (Informal):
- Conditional Significance Pruning: Discarding More of Huge Phrase Tables (Johnson, AMTA 2012)
- PDF:
- https://preview.aclanthology.org/autopr/2012.amta-papers.28.pdf