Kazuhide Yamamoto

2018

pdf
Crowdsourced Corpus of Sentence Simplification with Core Vocabulary
Akihiro Katsuta | Kazuhide Yamamoto
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
Simplified Corpus with Core Vocabulary
Takumi Maruyama | Kazuhide Yamamoto
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
Lexical Substitution is Practical for Rare Word Simplification
Takumi Maruyama | Kazuhide Yamamoto
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation

pdf
Dataset Construction Method for Word Reading Disambiguation
Koki Nishiyama | Kazuhide Yamamoto | Hideharu Nakajima
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation

pdf
VietSentiLex: a sentiment dictionary that considers the polarity of ambiguous sentiment words
Huynh Quoc Viet Vo | Kazuhide Yamamoto
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation

2017

pdf bib abs
Controlling Target Features in Neural Machine Translation via Prefix Constraints
Shunsuke Takeno | Masaaki Nagata | Kazuhide Yamamoto
Proceedings of the 4th Workshop on Asian Translation (WAT2017)

We propose prefix constraints, a novel method to enforce constraints on target sentences in neural machine translation. It places a sequence of special tokens at the beginning of target sentence (target prefix), while side constraints places a special token at the end of source sentence (source suffix). Prefix constraints can be predicted from source sentence jointly with target sentence, while side constraints (Sennrich et al., 2016) must be provided by the user or predicted by some other methods. In both methods, special tokens are designed to encode arbitrary features on target-side or metatextual information. We show that prefix constraints are more flexible than side constraints and can be used to control the behavior of neural machine translation, in terms of output length, bidirectional decoding, domain adaptation, and unaligned target word generation.

2016

pdf abs
Integrating empty category detection into preordering Machine Translation
Shunsuke Takeno | Masaaki Nagata | Kazuhide Yamamoto
Proceedings of the 3rd Workshop on Asian Translation (WAT2016)

We propose a method for integrating Japanese empty category detection into the preordering process of Japanese-to-English statistical machine translation. First, we apply machine-learning-based empty category detection to estimate the position and the type of empty categories in the constituent tree of the source sentence. Then, we apply discriminative preordering to the augmented constituent tree in which empty categories are treated as if they are normal lexical symbols. We find that it is effective to filter empty categories based on the confidence of estimation. Our experiments show that, for the IWSLT dataset consisting of short travel conversations, the insertion of empty categories alone improves the BLEU score from 33.2 to 34.3 and the RIBES score from 76.3 to 78.7, which imply that reordering has improved For the KFTT dataset consisting of Wikipedia sentences, the proposed preordering method considering empty categories improves the BLEU score from 19.9 to 20.2 and the RIBES score from 66.2 to 66.3, which shows both translation and reordering have improved slightly.

In order to investigate the effect of source language on translations, we investigate two variants of a Korean translation corpus. The first variant consists of Korean translations of 162,308 Japanese sentences from the ATR BTEC (Basic Expression Text Corpus). The second variant was made by translating the English translations of the Japanese sentences into Korean. We show that the source language text has a large influence on the target text. Even after normalizing orthographic differences, fewer than 8.3\% of the sentences in the two variants were identical. We describe in general which phenomena differ and then discuss how our analysis can be used in natural language processing.

pdf
Detecting Transliterated Orthographic Variants via Two Similarity Metrics
Kiyonori Ohtake | Youichi Sekiguchi | Kazuhide Yamamoto
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

2003

pdf
Applicability Analysis of Corpus-derived Paraphrases toward Example-based Paraphrasing
Kiyonori Ohtake | Kazuhide Yamamoto
Proceedings of the 17th Pacific Asia Conference on Language, Information and Computation

2002

pdf
Paraphrasing of Chinese Utterances
Yujie Zhang | Kazuhide Yamamoto
COLING 2002: The 19th International Conference on Computational Linguistics

pdf
Machine Translation by Interaction between Paraphraser and Transfer
Kazuhide Yamamoto
COLING 2002: The 19th International Conference on Computational Linguistics

pdf
Corpus-assisted expansion of manual MT knowledge:
Setsuo Yamada | Kenji Imamura | Kazuhide Yamamoto
Proceedings of the 9th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Papers

pdf
Acquisition of Lexical Paraphrases from Texts
Kazuhide Yamamoto
COLING-02: COMPUTERM 2002: Second International Workshop on Computational Terminology

pdf
Towards a Thesaurus of Predicates
Satoshi Shirai | Kazuhide Yamamoto | Francis Bond | Hozumi Tanaka
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

1999

ATR has built a multi-language speech translation system called ATR-MATRIX. It consists of a spoken-language translation subsystem, which is the focus of this paper, together with a highly accurate speech recognition subsystem and a high-definition speech synthesis subsystem. This paper gives a road map of solutions to the problems inherent in spoken-language translation. Spoken-language translation systems need to tackle difficult problems such as ungrammaticality. contextual phenomena, speech recognition errors, and the high-speeds required for real-time use. We have made great strides towards solving these problems in recent years. Our approach mainly uses an example-based translation model called TDMT. We have added the use of extra-linguistic information, a decision tree learning mechanism, and methods dealing with recognition errors.

pdf
Corpus-Based Anaphora Resolution Towards Antecedent Preference
Michael Paul | Kazuhide Yamamoto | Eiichiro Sumita
Coreference and Its Applications