Hiroyuki Kaji


Towards cross-lingual patent wikification
Takashi Tsunakawa | Hiroyuki Kaji
Proceedings of the 6th Workshop on Patent and Scientific Literature Translation


Enriching Wikipedia’s Intra-language Links by their Cross-language Transfer
Takashi Tsunakawa | Makoto Araya | Hiroyuki Kaji
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers


Improving Calculation of Contextual Similarity for Constructing a Bilingual Dictionary via a Third Language
Takashi Tsunakawa | Yosuke Yamamoto | Hiroyuki Kaji
Proceedings of the Sixth International Joint Conference on Natural Language Processing


Augmenting a Bilingual Lexicon with Information for Word Translation Disambiguation
Takashi Tsunakawa | Hiroyuki Kaji
Proceedings of the Eighth Workshop on Asian Language Resouces

Using Comparable Corpora to Adapt a Translation Model to Domains
Hiroyuki Kaji | Takashi Tsunakawa | Daisuke Okada
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Statistical machine translation (SMT) requires a large parallel corpus, which is available only for restricted language pairs and domains. To expand the language pairs and domains to which SMT is applicable, we created a method for estimating translation pseudo-probabilities from bilingual comparable corpora. The essence of our method is to calculate pairwise correlations between the words associated with a source-language word, presently restricted to a noun, and its translations; word translation pseudo-probabilities are calculated based on the assumption that the more associated words a translation is correlated with, the higher its translation probability. We also describe a method we created for calculating noun-sequence translation pseudo-probabilities based on occurrence frequencies of noun sequences and constituent-word translation pseudo-probabilities. Then, we present a framework for merging the translation pseudo-probabilities estimated from in-domain comparable corpora with a translation model learned from an out-of-domain parallel corpus. Experiments using Japanese and English comparable corpora of scientific paper abstracts and a Japanese-English parallel corpus of patent abstracts showed promising results; the BLEU score was improved to some degree by incorporating the pseudo-probabilities estimated from the in-domain comparable corpora. Future work includes an optimization of the parameters and an extension to estimate translation pseudo-probabilities for verbs.


Automatic Construction of a Japanese-Chinese Dictionary via English
Hiroyuki Kaji | Shin’ichi Tamamura | Dashtseren Erdenebat
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper proposes a method of constructing a dictionary for a pair of languages from bilingual dictionaries between each of the languages and a third language. Such a method would be useful for language pairs for which wide-coverage bilingual dictionaries are not available, but it suffers from spurious translations caused by the ambiguity of intermediary third-language words. To eliminate spurious translations, the proposed method uses the monolingual corpora of the first and second languages, whose availability is not as limited as that of parallel corpora. Extracting word associations from the corpora of both languages, the method correlates the associated words of an entry word with its translation candidates. It then selects translation candidates that have the highest correlations with a certain percentage or more of the associated words. The method has the following features. It first produces a domain-adapted bilingual dictionary. Second, the resulting bilingual dictionary, which not only provides translations but also associated words supporting each translation, enables contextually based selection of translations. Preliminary experiments using the EDR Japanese-English and LDC Chinese-English dictionaries together with Mainichi Newspaper and Xinhua News Agency corpora demonstrate that the proposed method is viable. The recall and precision could be improved by optimizing the parameters.


Development of a Japanese-Chinese machine translation system
Hitoshi Isahara | Sadao Kurohashi | Jun’ichi Tsujii | Kiyotaka Uchimoto | Hiroshi Nakagawa | Hiroyuki Kaji | Shun’ichi Kikuchi
Proceedings of Machine Translation Summit XI: Papers


Automatic Construction of Japanese WordNet
Hiroyuki Kaji | Mariko Watanabe
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Although WordNets have been developed for a number of languages, no attempts to construct a Japanese WordNet have been known to exist. Taking this into account, we launched a project to automatically translate the Princeton WordNet into Japanese by a method of unsupervised word-sense disambiguation using bilingual comparable corpora. The method we propose aligns English word associations with those in Japanese and iteratively calculates a correlation matrix of Japanese translations of an English word versus its associated words. It then determines the Japanese translation for the English word in a synset by calculating scores for translation candidates according to the correlation matrix and the associated words appearing in the gloss appended to the synset. This method is not robust because a gloss only contains a few associated words. To overcome this difficulty, we extended the method so that it retrieves texts by using the gloss as a query and uses the retrieved texts as well as the gloss to calculate scores for translation candidates. A preliminary experiment using Wall Street Journal and Nihon Keizai Shimbun corpora demonstrated that the proposed method is promising for constructing a Japanese WordNet.


Domain Dependence of Lexical Translation: A Case Study of Patent Abstracts
Hiroyuki Kaji
Workshop on patent translation

The domain dependence of translations of nouns in English-to-Japanese patent translation is examined using an automatic method for identifying major translations from a pair of language corpora in the same domain. The method calculates the ratio of the number of associated words of a target word that suggest each translation of the target word to the total number of associated words. This ratio indicates how major a translation is in a domain. Application of the method to a bilingual patent-abstract corpus indicates the necessity and effectiveness of dividing the patent domain into subdomains and adapting a bilingual dictionary to subdomains.


Constructing Word-Sense Association Networks from Bilingual Dictionary and Comparable Corpora
Hiroyuki Kaji | Osamu Imaichi
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

A novel thesaurus named a gword-sense association networkh is proposed for the first time. It consists of nodes representing word senses, each of which is defined as a set consisting of a word and its translation equivalents, and edges connecting topically associated word senses. This word-sense association network is produced from a bilingual dictionary and comparable corpora by means of a newly developed fully automatic method. The feasibility and effectiveness of the method were demonstrated experimentally by using the EDR English-Japanese dictionary together with Wall Street Journal and Nihon Keizai Shimbun corpora. The word-sense association networks were applied to word-sense disambiguation as well as to a query interface for information retrieval.

Adapted seed lexicon and combined bidirectional similarity measures for translation equivalent extraction from comparable corpora
Hiroyuki Kaji
Proceedings of the 10th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages

Bilingual-Dictionary Adaptation to Domains
Hiroyuki Kaji
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics


Word Sense Acquisition from Bilingual Comparable Corpora
Hiroyuki Kaji
Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics


Unsupervised Word Sense Disambiguation Using Bilingual Comparable Corpora
Hiroyuki Kaji | Yasutsugu Morimoto
COLING 2002: The 19th International Conference on Computational Linguistics


Corpus-dependent Association Thesauri for Information Retrieval
Hiroyuki Kaji | Yasutsugu Morimoto | Toshiko Aizono | Noriyuki Yamasaki
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics


Controlled languages for machine translation: state of the art
Hiroyuki Kaji
Proceedings of Machine Translation Summit VII


Extracting Word Correspondences from Bilingual Corpora Based on Word Co-occurrence Information
Hiroyuki Kaji | Toshiko Aizono
COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics


Learning Translation Templates From Bilingual Text
Hiroyuki Kaji | Yuuko Kida | Yasutsugu Morimoto
COLING 1992 Volume 2: The 14th International Conference on Computational Linguistics


Language control for effective utilization of HICATS/JE
Hiroyuki Kaji
Proceedings of Machine Translation Summit II


An Efficient Execution Method for Rule-Based Machine Translation
Hiroyuki Kaji
Coling Budapest 1988 Volume 2: International Conference on Computational Linguistics


HICATS/JE: A Japanese-to-English Machine Translation System Based on Semantics
Hiroyuki Kaji
Proceedings of Machine Translation Summit I


A Proper Treatmemt of Syntax and Semantics in Machine Translation
Yoshihiko Nitta | Atsushi Okajima | Hiroyuki Kaji | Youichi Hidano | Koichiro Ishihara
10th International Conference on Computational Linguistics and 22nd Annual Meeting of the Association for Computational Linguistics