Shin’ichi Tamamura


2008

pdf
Automatic Construction of a Japanese-Chinese Dictionary via English
Hiroyuki Kaji | Shin’ichi Tamamura | Dashtseren Erdenebat
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper proposes a method of constructing a dictionary for a pair of languages from bilingual dictionaries between each of the languages and a third language. Such a method would be useful for language pairs for which wide-coverage bilingual dictionaries are not available, but it suffers from spurious translations caused by the ambiguity of intermediary third-language words. To eliminate spurious translations, the proposed method uses the monolingual corpora of the first and second languages, whose availability is not as limited as that of parallel corpora. Extracting word associations from the corpora of both languages, the method correlates the associated words of an entry word with its translation candidates. It then selects translation candidates that have the highest correlations with a certain percentage or more of the associated words. The method has the following features. It first produces a domain-adapted bilingual dictionary. Second, the resulting bilingual dictionary, which not only provides translations but also associated words supporting each translation, enables contextually based selection of translations. Preliminary experiments using the EDR Japanese-English and LDC Chinese-English dictionaries together with Mainichi Newspaper and Xinhua News Agency corpora demonstrate that the proposed method is viable. The recall and precision could be improved by optimizing the parameters.