Abstract
We propose a method for integrating Japanese empty category detection into the preordering process of Japanese-to-English statistical machine translation. First, we apply machine-learning-based empty category detection to estimate the position and the type of empty categories in the constituent tree of the source sentence. Then, we apply discriminative preordering to the augmented constituent tree in which empty categories are treated as if they are normal lexical symbols. We find that it is effective to filter empty categories based on the confidence of estimation. Our experiments show that, for the IWSLT dataset consisting of short travel conversations, the insertion of empty categories alone improves the BLEU score from 33.2 to 34.3 and the RIBES score from 76.3 to 78.7, which imply that reordering has improved For the KFTT dataset consisting of Wikipedia sentences, the proposed preordering method considering empty categories improves the BLEU score from 19.9 to 20.2 and the RIBES score from 66.2 to 66.3, which shows both translation and reordering have improved slightly.- Anthology ID:
- W16-4615
- Volume:
- Proceedings of the 3rd Workshop on Asian Translation (WAT2016)
- Month:
- December
- Year:
- 2016
- Address:
- Osaka, Japan
- Editors:
- Toshiaki Nakazawa, Hideya Mino, Chenchen Ding, Isao Goto, Graham Neubig, Sadao Kurohashi, Ir. Hammam Riza, Pushpak Bhattacharyya
- Venue:
- WAT
- SIG:
- Publisher:
- The COLING 2016 Organizing Committee
- Note:
- Pages:
- 157–165
- Language:
- URL:
- https://aclanthology.org/W16-4615
- DOI:
- Cite (ACL):
- Shunsuke Takeno, Masaaki Nagata, and Kazuhide Yamamoto. 2016. Integrating empty category detection into preordering Machine Translation. In Proceedings of the 3rd Workshop on Asian Translation (WAT2016), pages 157–165, Osaka, Japan. The COLING 2016 Organizing Committee.
- Cite (Informal):
- Integrating empty category detection into preordering Machine Translation (Takeno et al., WAT 2016)
- PDF:
- https://preview.aclanthology.org/emnlp22-frontmatter/W16-4615.pdf