2021
pdf
abs
Field Experiments of Real Time Foreign News Distribution Powered by MT
Keiji Yasuda
|
Ichiro Yamada
|
Naoaki Okazaki
|
Hideki Tanaka
|
Hidehiro Asaka
|
Takeshi Anzai
|
Fumiaki Sugaya
Proceedings of Machine Translation Summit XVIII: Users and Providers Track
Field experiments on a foreign news distribution system using two key technologies are reported. The first technology is a summarization component, which is used for generating news headlines. This component is a transformer-based abstractive text summarization system which is trained to output headlines from the leading sentences of news articles. The second technology is machine translation (MT), which enables users to read foreign news articles in their mother language. Since the system uses MT, users can immediately access the latest foreign news. 139 Japanese LINE users participated in the field experiments for two weeks, viewing about 40,000 articles which had been translated from English to Japanese. We carried out surveys both during and after the experiments. According to the results, 79.3% of users evaluated the headlines as adequate, while 74.7% of users evaluated the automatically translated articles as intelligible. According to the post-experiment survey, 59.7% of users wished to continue using the system; 11.5% of users did not. We also report several statistics of the experiments.
pdf
bib
abs
Named Entity-Factored Transformer for Proper Noun Translation
Kohichi Takai
|
Gen Hattori
|
Akio Yoneyama
|
Keiji Yasuda
|
Katsuhito Sudoh
|
Satoshi Nakamura
Proceedings of the 18th International Conference on Natural Language Processing (ICON)
Subword-based neural machine translation decreases the number of out-of-vocabulary (OOV) words and also keeps the translation quality if input sentences include OOV words. The subword-based NMT decomposes a word into shorter units to solve the OOV problem, but it does not work well for non-compositional proper nouns due to the construction of the shorter unit from words. Furthermore, the lack of translation also occurs in proper noun translation. The proposed method applies the Named Entity (NE) fea-ture vector to Factored Transformer for accurate proper noun translation. The proposed method uses two features which are input sentences in subwords unit and the feature obtained from Named Entity Recognition (NER). The pro-posed method improves the problem of non-compositional proper nouns translation included a low-frequency word. According to the experiments, the proposed method using the best NE feature vector outperformed the baseline sub-word-based transformer model by more than 9.6 points in proper noun accuracy and 2.5 points in the BLEU score.
2018
pdf
abs
Prediction Models for Risk of Type-2 Diabetes Using Health Claims
Masatoshi Nagata
|
Kohichi Takai
|
Keiji Yasuda
|
Panikos Heracleous
|
Akio Yoneyama
Proceedings of the BioNLP 2018 workshop
This study focuses on highly accurate prediction of the onset of type-2 diabetes. We investigated whether prediction accuracy can be improved by utilizing lab test data obtained from health checkups and incorporating health claim text data such as medically diagnosed diseases with ICD10 codes and pharmacy information. In a previous study, prediction accuracy was increased slightly by adding diagnosis disease name and independent variables such as prescription medicine. Therefore, in the current study we explored more suitable models for prediction by using state-of-the-art techniques such as XGBoost and long short-term memory (LSTM) based on recurrent neural networks. In the current study, text data was vectorized using word2vec, and the prediction model was compared with logistic regression. The results obtained confirmed that onset of type-2 diabetes can be predicted with a high degree of accuracy when the XGBoost model is used.
2011
pdf
abs
Annotating data selection for improving machine translation
Keiji Yasuda
|
Hideo Okuma
|
Masao Utiyama
|
Eiichiro Sumita
Proceedings of the 8th International Workshop on Spoken Language Translation: Papers
In order to efficiently improve machine translation systems, we propose a method which selects data to be annotated (manually translated) from speech-to-speech translation field data. For the selection experiments, we used data from field experiments conducted during the 2009 fiscal year in five areas of Japan. For the selection experiments, we used data sets from two areas: one data set giving the lowest baseline speech translation performance for its test set, and another data set giving the highest. In the experiments, we compare two methods for selecting data to be manually translated from the field data. Both of them use source side language models for data selection, but in different manners. According to the experimental results, either or both of the methods show larger improvements compared to a random data selection.
2009
pdf
Mining Parallel Texts from Mixed-Language Web Pages
Masao Utiyama
|
Daisuke Kawahara
|
Keiji Yasuda
|
Eiichiro Sumita
Proceedings of Machine Translation Summit XII: Papers
2008
pdf
abs
The NICT/ATR speech translation system for IWSLT 2008.
Masao Utiyama
|
Andrew Finch
|
Hideo Okuma
|
Michael Paul
|
Hailong Cao
|
Hirofumi Yamamoto
|
Keiji Yasuda
|
Eiichiro Sumita
Proceedings of the 5th International Workshop on Spoken Language Translation: Evaluation Campaign
This paper describes the National Institute of Information and Communications Technology/Advanced Telecommunications Research Institute International (NICT/ATR) statistical machine translation (SMT) system used for the IWSLT 2008 evaluation campaign. We participated in the Chinese–English (Challenge Task), English–Chinese (Challenge Task), Chinese–English (BTEC Task), Chinese–Spanish (BTEC Task), and Chinese–English–Spanish (PIVOT Task) translation tasks. In the English–Chinese translation Challenge Task, we focused on exploring various factors for the English–Chinese translation because the research on the translation of English–Chinese is scarce compared to the opposite direction. In the Chinese–English translation Challenge Task, we employed a novel clustering method, where training sentences similar to the development data in terms of the word error rate formed a cluster. In the pivot translation task, we integrated two strategies for pivot translation by linear interpolation.
pdf
Method of Selecting Training Data to Build a Compact and Efficient Translation Model
Keiji Yasuda
|
Ruiqiang Zhang
|
Hirofumi Yamamoto
|
Eiichiro Sumita
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II
pdf
Improved Statistical Machine Translation by Multiple Chinese Word Segmentation
Ruiqiang Zhang
|
Keiji Yasuda
|
Eiichiro Sumita
Proceedings of the Third Workshop on Statistical Machine Translation
2007
pdf
Method of selecting training sets to build compact and efficient language model
Keiji Yasuda
|
Hirofumi Yamamoto
|
Eiichiro Sumita
Proceedings of the Workshop on Using corpora for natural language generation
pdf
abs
The NICT/ATR speech translation system for IWSLT 2007
Andrew Finch
|
Etienne Denoual
|
Hideo Okuma
|
Michael Paul
|
Hirofumi Yamamoto
|
Keiji Yasuda
|
Ruiqiang Zhang
|
Eiichiro Sumita
Proceedings of the Fourth International Workshop on Spoken Language Translation
This paper describes the NiCT-ATR statistical machine translation (SMT) system used for the IWSLT 2007 evaluation campaign. We participated in three of the four language pair translation tasks (CE, JE, and IE). We used a phrase-based SMT system using log-linear feature models for all tracks. This year we decoded from the ASR n-best lists in the JE track and found a gain in performance. We also applied some new techniques to facilitate the use of out-of-domain external resources by model combination and also by utilizing a huge corpus of n-grams provided by Google Inc.. Using these resources gave mixed results that depended on the technique also the language pair however, in some cases we achieved consistently positive results. The results from model-interpolation in particular were very promising.
2006
pdf
The NiCT-ATR statistical machine translation system for IWSLT 2006
Ruiqiang Zhang
|
Hirofumi Yamamoto
|
Michael Paul
|
Hideo Okuma
|
Keiji Yasuda
|
Yves Lepage
|
Etienne Denoual
|
Daichi Mochihashi
|
Andrew Finch
|
Eiichiro Sumita
Proceedings of the Third International Workshop on Spoken Language Translation: Evaluation Campaign
2005
pdf
Assessing Degradation of Spoken Language Translation by Measuring Speech Recognizer’s Output against Non-native Speakers’ Listening Capabilities
Toshiyuki Takezawa
|
Keiji Yasuda
|
Masahide Mizushima
|
Genichiro Kikui
Proceedings of Machine Translation Summit X: Papers
2004
pdf
Automatic Measuring of English Language Proficiency using MT Evaluation Technology
Keiji Yasuda
|
Fumiaki Sugaya
|
Eiichiro Sumita
|
Toshiyuki Takezawa
|
Genichiro Kikui
|
Seiichi Yamamoto
Proceedings of the Workshop on eLearning for Computational Linguistics and Computational Linguistics for eLearning
2003
pdf
Applications of Automatic Evaluation Methods to Measuring a Capability of Speech Translation System
Keiji Yasuda
|
Fumiaki Sugaya
|
Toshiyuki Takezawa
|
Seiichi Yamamoto
|
Masuzo Yanagida
10th Conference of the European Chapter of the Association for Computational Linguistics
2002
pdf
Quality-Sensitive Test Set Selection for a Speech Translation System
Fumiaki Sugaya
|
Keiji Yasuda
|
Toshiyuki Takezawa
|
Seiichi Yamamoto
Proceedings of the ACL-02 Workshop on Speech-to-Speech Translation: Algorithms and Systems
pdf
Automatic machine translation selection scheme to output the best result
Keiji Yasuda
|
Fumiaki Sugaya
|
Toshiyuki Takezawa
|
Seiichi Yamamoto
|
Masuzo Yanagida
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)
2001
pdf
abs
Precise measurement method of a speech translation system’s capability with a paired comparison method between the system and humans
Fumiaki Sugaya
|
Keiji Yasuda
|
Toshiyuki Takezawa
|
Seiichi Yamamoto
Proceedings of Machine Translation Summit VIII
The main goal of the present paper is to propose new schemes for the overall evaluation of a speech translation system. These schemes are expected to support and improve the design of the target application system, and precisely determine its performance. Experiments are conducted on the Japanese-to-English speech translation system ATR-MATRIX, which was developed at ATR Interpreting Telecommunications Research Laboratories. In the proposed schemes, the system’s translations are compared with those of a native Japanese taking the Test of English for International Communication (TOEIC), which is used as a measure of one’s speech translation capability. Subjective and automatic comparisons are made and the results are compared. A regression analysis on the subjective results shows that the speech translation capability of ATR-MATRIX matches a Japanese person scoring around 500 on the TOEIC. The automatic comparisons also show promising results.
pdf
abs
An automatic evaluation method of translation quality using translation answer candidates queried from a parallel corpus
Keiji Yasuda
|
Fumiaki Sugaya
|
Toshiyuki Takezawa
|
Seiichi Yamamoto
|
Masuzo Yanagida
Proceedings of Machine Translation Summit VIII
An automatic translation quality evaluation method is proposed. In the proposed method, a parallel corpus is used to query translation answer candidates. The translation output is evaluated by measuring the similarity between the translation output and translation answer candidates with DP matching. This method evaluates a language translation subsystem of the Japanese-to-English ATR-MATRIX speech translation system developed at ATR Interpreting Telecommunications Research Laboratories. Discriminant analysis is then carried out to examine the evaluation performance of the proposed method. Experimental results show the effectiveness of the proposed method. The discriminant ratio is 83.5% for 2-class discrimination between absolutely correct and less appropriate translations classified subjectively. Also discussed are issues of the proposed method when it is applied to speech translation systems which inevitably make recognition errors.