Producing Unseen Morphological Variants in Statistical Machine Translation

Matthias Huck, Aleš Tamchyna, Ondřej Bojar, Alexander Fraser


Abstract
Translating into morphologically rich languages is difficult. Although the coverage of lemmas may be reasonable, many morphological variants cannot be learned from the training data. We present a statistical translation system that is able to produce these inflected word forms. Different from most previous work, we do not separate morphological prediction from lexical choice into two consecutive steps. Our approach is novel in that it is integrated in decoding and takes advantage of context information from both the source language and the target language sides.
Anthology ID:
E17-2059
Volume:
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers
Month:
April
Year:
2017
Address:
Valencia, Spain
Editors:
Mirella Lapata, Phil Blunsom, Alexander Koller
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
369–375
Language:
URL:
https://aclanthology.org/E17-2059
DOI:
Bibkey:
Cite (ACL):
Matthias Huck, Aleš Tamchyna, Ondřej Bojar, and Alexander Fraser. 2017. Producing Unseen Morphological Variants in Statistical Machine Translation. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 369–375, Valencia, Spain. Association for Computational Linguistics.
Cite (Informal):
Producing Unseen Morphological Variants in Statistical Machine Translation (Huck et al., EACL 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/E17-2059.pdf