Offline extraction of overlapping phrases for hierarchical phrase-based translation

Sariya Karimova, Patrick Simianer, Stefan Riezler


Abstract
Standard SMT decoders operate by translating disjoint spans of input words, thus discarding information in form of overlapping phrases that is present at phrase extraction time. The use of overlapping phrases in translation may enhance fluency in positions that would otherwise be phrase boundaries, they may provide additional statistical support for long and rare phrases, and they may generate new phrases that have never been seen in the training data. We show how to extract overlapping phrases offline for hierarchical phrasebased SMT, and how to extract features and tune weights for the new phrases. We find gains of 0.3 − 0.6 BLEU points over discriminatively trained hierarchical phrase-based SMT systems on two datasets for German-to-English translation.
Anthology ID:
2014.iwslt-papers.12
Volume:
Proceedings of the 11th International Workshop on Spoken Language Translation: Papers
Month:
December 4-5
Year:
2014
Address:
Lake Tahoe, California
Venue:
IWSLT
SIG:
SIGSLT
Publisher:
Note:
Pages:
236–243
Language:
URL:
https://aclanthology.org/2014.iwslt-papers.12
DOI:
Bibkey:
Cite (ACL):
Sariya Karimova, Patrick Simianer, and Stefan Riezler. 2014. Offline extraction of overlapping phrases for hierarchical phrase-based translation. In Proceedings of the 11th International Workshop on Spoken Language Translation: Papers, pages 236–243, Lake Tahoe, California.
Cite (Informal):
Offline extraction of overlapping phrases for hierarchical phrase-based translation (Karimova et al., IWSLT 2014)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2014.iwslt-papers.12.pdf