Hitoshi Ito


2021

pdf bib
NHK’s Lexically-Constrained Neural Machine Translation at WAT 2021
Hideya Mino | Kazutaka Kinugawa | Hitoshi Ito | Isao Goto | Ichiro Yamada | Takenobu Tokunaga
Proceedings of the 8th Workshop on Asian Translation (WAT2021)

This paper describes the system of our team (NHK) for the WAT 2021 Japanese-English restricted machine translation task. In this task, the aim is to improve quality while maintaining consistent terminology for scientific paper translation. This task has a unique feature, where some words in a target sentence are given in addition to a source sentence. In this paper, we use a lexically-constrained neural machine translation (NMT), which concatenates the source sentence and constrained words with a special token to input them into the encoder of NMT. The key to the successful lexically-constrained NMT is the way to extract constraints from a target sentence of training data. We propose two extraction methods: proper-noun constraint and mistranslated-word constraint. These two methods consider the importance of words and fallibility of NMT, respectively. The evaluation results demonstrate the effectiveness of our lexical-constraint method.

2020

pdf
Neural Machine Translation Using Extracted Context Based on Deep Analysis for the Japanese-English Newswire Task at WAT 2020
Isao Goto | Hideya Mino | Hitoshi Ito | Kazutaka Kinugawa | Ichiro Yamada | Hideki Tanaka
Proceedings of the 7th Workshop on Asian Translation

This paper describes the system of the NHK-NES team for the WAT 2020 Japanese–English newswire task. There are two main problems in Japanese-English news translation: translation of dropped subjects and compatibility between equivalent translations and English news-style outputs. We address these problems by extracting subjects from the context based on predicate-argument structures and using them as additional inputs, and constructing parallel Japanese-English news sentences equivalently translated from English news sentences. The evaluation results confirm the effectiveness of our context-utilization method.

pdf
Content-Equivalent Translated Parallel News Corpus and Extension of Domain Adaptation for NMT
Hideya Mino | Hideki Tanaka | Hitoshi Ito | Isao Goto | Ichiro Yamada | Takenobu Tokunaga
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this paper, we deal with two problems in Japanese-English machine translation of news articles. The first problem is the quality of parallel corpora. Neural machine translation (NMT) systems suffer degraded performance when trained with noisy data. Because there is no clean Japanese-English parallel data for news articles, we build a novel parallel news corpus consisting of Japanese news articles translated into English in a content-equivalent manner. This is the first content-equivalent Japanese-English news corpus translated specifically for training NMT systems. The second problem involves the domain-adaptation technique. NMT systems suffer degraded performance when trained with mixed data having different features, such as noisy data and clean data. Though the existing methods try to overcome this problem by using tags for distinguishing the differences between corpora, it is not sufficient. We thus extend a domain-adaptation method using multi-tags to train an NMT model effectively with the clean corpus and existing parallel news corpora with some types of noise. Experimental results show that our corpus increases the translation quality, and that our domain-adaptation method is more effective for learning with the multiple types of corpora than existing domain-adaptation methods are.

pdf
Effective Use of Target-side Context for Neural Machine Translation
Hideya Mino | Hitoshi Ito | Isao Goto | Ichiro Yamada | Takenobu Tokunaga
Proceedings of the 28th International Conference on Computational Linguistics

In this paper, we deal with two problems in Japanese-English machine translation of news articles. The first problem is the quality of parallel corpora. Neural machine translation (NMT) systems suffer degraded performance when trained with noisy data. Because there is no clean Japanese-English parallel data for news articles, we build a novel parallel news corpus consisting of Japanese news articles translated into English in a content-equivalent manner. This is the first content-equivalent Japanese-English news corpus translated specifically for training NMT systems. The second problem involves the domain-adaptation technique. NMT systems suffer degraded performance when trained with mixed data having different features, such as noisy data and clean data. Though the existing methods try to overcome this problem by using tags for distinguishing the differences between corpora, it is not sufficient. We thus extend a domain-adaptation method using multi-tags to train an NMT model effectively with the clean corpus and existing parallel news corpora with some types of noise. Experimental results show that our corpus increases the translation quality, and that our domain-adaptation method is more effective for learning with the multiple types of corpora than existing domain-adaptation methods are.

2019

pdf
Neural Machine Translation System using a Content-equivalently Translated Parallel Corpus for the Newswire Translation Tasks at WAT 2019
Hideya Mino | Hitoshi Ito | Isao Goto | Ichiro Yamada | Hideki Tanaka | Takenobu Tokunaga
Proceedings of the 6th Workshop on Asian Translation

This paper describes NHK and NHK Engineering System (NHK-ES)’s submission to the newswire translation tasks of WAT 2019 in both directions of Japanese→English and English→Japanese. In addition to the JIJI Corpus that was officially provided by the task organizer, we developed a corpus of 0.22M sentence pairs by manually, translating Japanese news sentences into English content- equivalently. The content-equivalent corpus was effective for improving translation quality, and our systems achieved the best human evaluation scores in the newswire translation tasks at WAT 2019.