2021
pdf
abs
Field Experiments of Real Time Foreign News Distribution Powered by MT
Keiji Yasuda
|
Ichiro Yamada
|
Naoaki Okazaki
|
Hideki Tanaka
|
Hidehiro Asaka
|
Takeshi Anzai
|
Fumiaki Sugaya
Proceedings of Machine Translation Summit XVIII: Users and Providers Track
Field experiments on a foreign news distribution system using two key technologies are reported. The first technology is a summarization component, which is used for generating news headlines. This component is a transformer-based abstractive text summarization system which is trained to output headlines from the leading sentences of news articles. The second technology is machine translation (MT), which enables users to read foreign news articles in their mother language. Since the system uses MT, users can immediately access the latest foreign news. 139 Japanese LINE users participated in the field experiments for two weeks, viewing about 40,000 articles which had been translated from English to Japanese. We carried out surveys both during and after the experiments. According to the results, 79.3% of users evaluated the headlines as adequate, while 74.7% of users evaluated the automatically translated articles as intelligible. According to the post-experiment survey, 59.7% of users wished to continue using the system; 11.5% of users did not. We also report several statistics of the experiments.
pdf
bib
abs
NHK’s Lexically-Constrained Neural Machine Translation at WAT 2021
Hideya Mino
|
Kazutaka Kinugawa
|
Hitoshi Ito
|
Isao Goto
|
Ichiro Yamada
|
Takenobu Tokunaga
Proceedings of the 8th Workshop on Asian Translation (WAT2021)
This paper describes the system of our team (NHK) for the WAT 2021 Japanese-English restricted machine translation task. In this task, the aim is to improve quality while maintaining consistent terminology for scientific paper translation. This task has a unique feature, where some words in a target sentence are given in addition to a source sentence. In this paper, we use a lexically-constrained neural machine translation (NMT), which concatenates the source sentence and constrained words with a special token to input them into the encoder of NMT. The key to the successful lexically-constrained NMT is the way to extract constraints from a target sentence of training data. We propose two extraction methods: proper-noun constraint and mistranslated-word constraint. These two methods consider the importance of words and fallibility of NMT, respectively. The evaluation results demonstrate the effectiveness of our lexical-constraint method.
2020
pdf
abs
Effective Use of Target-side Context for Neural Machine Translation
Hideya Mino
|
Hitoshi Ito
|
Isao Goto
|
Ichiro Yamada
|
Takenobu Tokunaga
Proceedings of the 28th International Conference on Computational Linguistics
In this paper, we deal with two problems in Japanese-English machine translation of news articles. The first problem is the quality of parallel corpora. Neural machine translation (NMT) systems suffer degraded performance when trained with noisy data. Because there is no clean Japanese-English parallel data for news articles, we build a novel parallel news corpus consisting of Japanese news articles translated into English in a content-equivalent manner. This is the first content-equivalent Japanese-English news corpus translated specifically for training NMT systems. The second problem involves the domain-adaptation technique. NMT systems suffer degraded performance when trained with mixed data having different features, such as noisy data and clean data. Though the existing methods try to overcome this problem by using tags for distinguishing the differences between corpora, it is not sufficient. We thus extend a domain-adaptation method using multi-tags to train an NMT model effectively with the clean corpus and existing parallel news corpora with some types of noise. Experimental results show that our corpus increases the translation quality, and that our domain-adaptation method is more effective for learning with the multiple types of corpora than existing domain-adaptation methods are.
pdf
abs
Content-Equivalent Translated Parallel News Corpus and Extension of Domain Adaptation for NMT
Hideya Mino
|
Hideki Tanaka
|
Hitoshi Ito
|
Isao Goto
|
Ichiro Yamada
|
Takenobu Tokunaga
Proceedings of the Twelfth Language Resources and Evaluation Conference
In this paper, we deal with two problems in Japanese-English machine translation of news articles. The first problem is the quality of parallel corpora. Neural machine translation (NMT) systems suffer degraded performance when trained with noisy data. Because there is no clean Japanese-English parallel data for news articles, we build a novel parallel news corpus consisting of Japanese news articles translated into English in a content-equivalent manner. This is the first content-equivalent Japanese-English news corpus translated specifically for training NMT systems. The second problem involves the domain-adaptation technique. NMT systems suffer degraded performance when trained with mixed data having different features, such as noisy data and clean data. Though the existing methods try to overcome this problem by using tags for distinguishing the differences between corpora, it is not sufficient. We thus extend a domain-adaptation method using multi-tags to train an NMT model effectively with the clean corpus and existing parallel news corpora with some types of noise. Experimental results show that our corpus increases the translation quality, and that our domain-adaptation method is more effective for learning with the multiple types of corpora than existing domain-adaptation methods are.
pdf
abs
Neural Machine Translation Using Extracted Context Based on Deep Analysis for the Japanese-English Newswire Task at WAT 2020
Isao Goto
|
Hideya Mino
|
Hitoshi Ito
|
Kazutaka Kinugawa
|
Ichiro Yamada
|
Hideki Tanaka
Proceedings of the 7th Workshop on Asian Translation
This paper describes the system of the NHK-NES team for the WAT 2020 Japanese–English newswire task. There are two main problems in Japanese-English news translation: translation of dropped subjects and compatibility between equivalent translations and English news-style outputs. We address these problems by extracting subjects from the context based on predicate-argument structures and using them as additional inputs, and constructing parallel Japanese-English news sentences equivalently translated from English news sentences. The evaluation results confirm the effectiveness of our context-utilization method.
2019
pdf
abs
Neural Machine Translation System using a Content-equivalently Translated Parallel Corpus for the Newswire Translation Tasks at WAT 2019
Hideya Mino
|
Hitoshi Ito
|
Isao Goto
|
Ichiro Yamada
|
Hideki Tanaka
|
Takenobu Tokunaga
Proceedings of the 6th Workshop on Asian Translation
This paper describes NHK and NHK Engineering System (NHK-ES)’s submission to the newswire translation tasks of WAT 2019 in both directions of Japanese→English and English→Japanese. In addition to the JIJI Corpus that was officially provided by the task organizer, we developed a corpus of 0.22M sentence pairs by manually, translating Japanese news sentences into English content- equivalently. The content-equivalent corpus was effective for improving translation quality, and our systems achieved the best human evaluation scores in the newswire translation tasks at WAT 2019.
2017
pdf
Extracting Important Tweets for News Writers using Recurrent Neural Network with Attention Mechanism and Multi-task Learning
Taro Miyazaki
|
Shin Toriumi
|
Yuka Takei
|
Ichiro Yamada
|
Jun Goto
Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation
pdf
Tweet Extraction for News Production Considering Unreality
Yuka Takei
|
Taro Miyazaki
|
Ichiro Yamada
|
Jun Goto
Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation
2012
pdf
Measuring the Similarity between TV Programs using Semantic Relations
Ichiro Yamada
|
Masaru Miyazaki
|
Hideki Sumiyoshi
|
Atsushi Matsui
|
Hironori Furumiya
|
Hideki Tanaka
Proceedings of COLING 2012
2011
pdf
Relation Acquisition using Word Classes and Partial Patterns
Stijn De Saeger
|
Kentaro Torisawa
|
Masaaki Tsuchida
|
Jun’ichi Kazama
|
Chikara Hashimoto
|
Ichiro Yamada
|
Jong Hoon Oh
|
Istvan Varga
|
Yulan Yan
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing
pdf
Extending WordNet with Hypernyms and Siblings Acquired from Wikipedia
Ichiro Yamada
|
Jong-Hoon Oh
|
Chikara Hashimoto
|
Kentaro Torisawa
|
Jun’ichi Kazama
|
Stijn De Saeger
|
Takuya Kawada
Proceedings of 5th International Joint Conference on Natural Language Processing
2010
pdf
Co-STAR: A Co-training Style Algorithm for Hyponymy Relation Acquisition from Structured and Unstructured Text
Jong-Hoon Oh
|
Ichiro Yamada
|
Kentaro Torisawa
|
Stijn De Saeger
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)
2009
pdf
Hypernym Discovery Based on Distributional Similarity and Hierarchical Structures
Ichiro Yamada
|
Kentaro Torisawa
|
Jun’ichi Kazama
|
Kow Kuroda
|
Masaki Murata
|
Stijn De Saeger
|
Francis Bond
|
Asuka Sumida
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing
2004
pdf
Automatic Discovery of Telic and Agentive Roles from Corpus Data
Ichiro Yamada
|
Timothy Baldwin
Proceedings of the 18th Pacific Asia Conference on Language, Information and Computation