Keita Tsuji


2010

pdf
Automatic Term Recognition Based on the Statistical Differences of Relative Frequencies in Different Corpora
Junko Kubo | Keita Tsuji | Shigeo Sugimoto
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper, we propose a method for automatic term recognition (ATR) which uses the statistical differences of relative frequencies of terms in target domain corpus and elsewhere. Generally, the target terms appear more frequently in target domain corpus than in other domain corpora. Utilizing such characteristics will lead to the improvement of extraction performance. Most of the ATR methods proposed so far only use the target domain corpus and do not take such characteristics into account. For the extraction experiment, we used the abstracts of a women's studies journal as a target domain corpus and those of academic journals of 39 domains as other domain corpora. The women's studies terms which were used for extraction evaluation were manually identified terms in the abstracts. The extraction performance was analyzed and we found that our method outperformed earlier methods. The previous methods were based on C-value, FLR and methods which were also used with other domain corpora.

2008

pdf
Temporal Aspects of Terminology for Automatic Term Recognition: Case Study on Women’s Studies Terms
Junko Kubo | Keita Tsuji | Shigeo Sugimoto
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The purpose of this paper is to clarify the temporal aspect of terminology focusing on the dictionary’s impact on terms. We used women’s studies terms as data and examined the changes of their values of five automatic term recognition (ATR) measures before and after dictionary publication. The changes of precision and recall of extraction based on these measures were also examined. The measures are TFIDF, C-value, MC-value, Nakagawa’s FLR, and simple document frequencies. We found that being listed in dictionaries gives longevity to terms and prevent them from losing termhood that is represented by these ATR measures. The peripheral or relatively less important terms are more likely to be influenced by dictionaries and their termhood increase after being listed in dictionaries. Among the termhood, the potential of word formation that can be measured by Nakagawa’s FLR seemed to be influenced most and the terms gradually gained it after being listed in dictionaries.

2004

pdf
Extracting Low-frequency Translation Pairs from Japanese-English Bilingual Corpora
Keita Tsuji | Kyo Kageura
Proceedings of CompuTerm 2004: 3rd International Workshop on Computational Terminology

2002

pdf
Extracting French-Japanese Word Pairs from Bilingual Corpora based on Transliteration Rules
Keita Tsuji | Beatrice Daille | Kyo Kageura
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2000

pdf
Automatic Thesaurus Generation through Multiple Filtering
Kyo Kageura | Keita Tsuji | Akiko N. Aizawa
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics