Atsushi Fujii

Reflecting the rapid growth of science, technology, and culture, it has become common practice to consult tools on the World Wide Web for various terms. Existing search engines provide an enormous volume of information, but retrieved information is not organized. Hand-compiled encyclopedias provide organized information, but the quantity of information is limited. In this paper, aiming to integrate the advantages of both tools, we propose a method to organize a search result based on multiple viewpoints as in Wikipedia. Because viewpoints required for explanation are different depending on the type of a term, such as animal and disease, we model articles in Wikipedia to extract a viewpoint structure for each term type. To identify a set of term types, we independently use manual annotation and automatic document clustering for Wikipedia articles. We also propose an effective feature for clustering of Wikipedia articles. We experimentally show that the document clustering reduces the cost for the manual annotation while maintaining the accuracy for modeling Wikipedia articles.

2010

pdf abs
Modeling Wikipedia Articles to Enhance Encyclopedic Search
Atsushi Fujii
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Reflecting the rapid growth of science, technology, and culture, it has become common practice to consult tools on the World Wide Web for various terms. Existing search engines provide an enormous volume of information, but retrieved information is not organized. Hand-compiled encyclopedias provide organized information, but the quantity of information is limited. To integrate the advantages of both tools, we have been proposing methods for encyclopedic search targeting information on the Web and patent information. In this paper, we propose a method to categorize multiple expository texts for a single term based on viewpoints. Because viewpoints required for explanation are different depending on the type of a term, such as animals and diseases, it is difficult to manually produce a large scale system. We use Wikipedia to extract a prototype of a viewpoint structure for each term type. We also use articles in Wikipedia for a machine learning method, which categorizes a given text into an appropriate viewpoint. We evaluate the effectiveness of our method experimentally.

2009

pdf bib
Exploiting Patent Information for the Evaluation of Machine Translation
Atsushi Fujii | Masao Utiyama | Mikio Yamamoto | Takehito Utsuro
Proceedings of the Third Workshop on Patent Translation

2008

pdf bib
A Lemmatization Method for Modern Mongolian and its Application to Information Retrieval
Badam-Osor Khaltar | Atsushi Fujii
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

pdf
Effects of Related Term Extraction in Transliteration into Chinese
HaiXiang Huang | Atsushi Fujii
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II

pdf
Statistical Machine Translation based Passage Retrieval for Cross-Lingual Question Answering
Tomoyosi Akiba | Kei Shimizu | Atsushi Fujii
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II

pdf abs
Producing a Test Collection for Patent Machine Translation in the Seventh NTCIR Workshop
Atsushi Fujii | Masao Utiyama | Mikio Yamamoto | Takehito Utsuro
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In aiming at research and development on machine translation, we produced a test collection for Japanese-English machine translation in the seventh NTCIR Workshop. This paper describes details of our test collection. From patent documents published in Japan and the United States, we extracted patent families as a parallel corpus. A patent family is a set of patent documents for the same or related invention and these documents are usually filed to more than one country in different languages. In the parallel corpus, we aligned Japanese sentences with their counterpart English sentences. Our test collection, which includes approximately 2,000,000 sentence pairs, can be used to train and test machine translation systems. Our test collection also includes search topics for cross-lingual patent retrieval and the contribution of machine translation to a patent retrieval task can also be evaluated. Our test collection will be available to the public for research purposes after the NTCIR final meeting.

pdf abs
Producing an Encyclopedic Dictionary using Patent Documents
Atsushi Fujii
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Although the World Wide Web has late become an important source to consult for the meaning of words, a number of technical terms related to high technology are not found on the Web. This paper describes a method to produce an encyclopedic dictionary for high-tech terms from patent information. We used a collection of unexamined patent applications published by the Japanese Patent Office as a source corpus. Given this collection, we extracted terms as headword candidates and retrieved applications including those headwords. Then, we extracted paragraph-style descriptions and categorized them into technical domains. We also extracted related terms for each headword. We have produced a dictionary including approximately 400,000 Japanese terms as headwords. We have also implemented an interface with which users can explore our dictionary by reading text descriptions and viewing a related-term graph.

pdf abs
Toward the Evaluation of Machine Translation Using Patent Information
Atsushi Fujii | Masao Utiyama | Mikio Yamamoto | Takehito Utsuro
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers

To aid research and development in machine translation, we have produced a test collection for Japanese/English machine translation. To obtain a parallel corpus, we extracted patent documents for the same or related inventions published in Japan and the United States. Our test collection includes approximately 2000000 sentence pairs in Japanese and English, which were extracted automatically from our parallel corpus. These sentence pairs can be used to train and evaluate machine translation systems. Our test collection also includes search topics for cross-lingual patent retrieval, which can be used to evaluate the contribution of machine translation to retrieving patent documents across languages. This paper describes our test collection, methods for evaluating machine translation, and preliminary experiments.

2006

pdf abs
Statistical Analysis for Thesaurus Construction using an Encyclopedic Corpus
Yasunori Ohishi | Katunobu Itou | Kazuya Takeda | Atsushi Fujii
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper proposes a discrimination method for hierarchical relationsbetween word pairs. The method is a statistical one using an encyclopedic corpus' extracted and organized from Web pages. In the proposed method, we use the statistical naturethat hyponyms' descriptionstend to include hypernyms whereas hypernyms' descriptions do notinclude all of the hyponyms.Experimental results show that the method detected 61.7% of therelations in an actual thesaurus.

pdf abs
Test Collections for Patent Retrieval and Patent Classification in the Fifth NTCIR Workshop
Atsushi Fujii | Makoto Iwayama | Noriko Kando
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper describes the test collections produced for the Patent Retrieval Task in the Fifth NTCIR Workshop. We performed the invalidity search task, in which each participant group searches a patent collection for the patents that can invalidate the demand in an existing claim. For this purpose, we performed both document and passage retrieval tasks. We also performed the automatic patent classification task using the F-term classification system. The test collections will be available to the public for research purposes.

pdf
Extracting Loanwords from Mongolian Corpora and Producing a Japanese-Mongolian Bilingual Dictionary
Badam-Osor Khaltar | Atsushi Fujii | Tetsuya Ishikawa
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf
A System for Summarizing and Visualizing Arguments in Subjective Documents: Toward Supporting Decision Making
Atsushi Fujii | Tetsuya Ishikawa
Proceedings of the Workshop on Sentiment and Subjectivity in Text

pdf
Modeling Impression in Probabilistic Transliteration into Chinese
LiLi Xu | Atsushi Fujii | Tetsuya Ishikawa
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

2004

pdf
Term Extraction from Korean Corpora via Japanese
Atsushi Fujii | Tetsuya Ishikawa | Jong-Hyeok Lee
Proceedings of CompuTerm 2004: 3rd International Workshop on Computational Terminology

pdf
Summarizing Encyclopedic Term Descriptions on the Web
Atsushi Fujii | Tetsuya Ishikawa
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf
Collecting Spontaneously Spoken Queries for Information Retrieval
Tomoyosi Akiba | Atsushi Fujii | Katunobu Itou
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf
Test Collections for Patent-to-Patent Retrieval and Patent Map Generation in NTCIR-4 Workshop
Atsushi Fujii | Makoto Iwayama | Noriko Kando
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2003

pdf
Overview of Patent Retrieval Task at NTCIR-3
Makoto Iwayama | Atsushi Fujii | Noriko Kando | Akihiko Takano
Proceedings of the ACL-2003 Workshop on Patent Corpus Processing

pdf abs
A system for Japanese/English/Korean multilingual patent retrieval
Mitsuharu Makita | Shigeto Higuchi | Atsushi Fujii | Tetsuya Ishikawa
Proceedings of Machine Translation Summit IX: System Presentations

In response to growing needs for cross-lingual patent retrieval, we propose PRIME (Patent Retrieval In Multilingual Environment system), in which users can retrieve and browse patents in foreign languages only by their native language. PRIME translates a query in the user language into the target language, retrieves patents relevant to the query, and translates retrieved patents into the user language. To update a translation dictionary, PRIME automatically extracts new translations from parallel patent corpora. In the current implementation, trilingual (J/E/K) patent retrieval is available. We describe the system design and its evaluation.

2002

pdf
A Method for Open-Vocabulary Speech-Driven Text Retrieval
Atsushi Fujii | Katunobu Itou | Tetsuya Ishikawa
Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002)

pdf
A Probabilistic Method for Analyzing Japanese Anaphora Integrating Zero Pronoun Detection and Resolution
Kazuhiro Seki | Atsushi Fujii | Tetsuya Ishikawa
COLING 2002: The 19th International Conference on Computational Linguistics

pdf
Producing a Large-scale Encyclopedic Corpus over the Web
Atsushi Fujii | Katunobu Itou | Tetsuya Ishikawa
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2001

pdf
Question Answering Using Encyclopedic Knowledge Generated from the Web
Atsushi Fujii | Tetsuya Ishikawa
Proceedings of the ACL 2001 Workshop on Open-Domain Question Answering

pdf
PRIME: a system for multi-lingual patent retrieval
Shigeto Higuchi | Masatoshi Fukui | Atsushi Fujii | Tetsuya Ishikawa
Proceedings of Machine Translation Summit VIII

pdf
Organizing Encyclopedic Knowledge based on the Web and its Application to Question Answering
Atsushi Fujii | Tetsuya Ishikawa
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics

2000

pdf bib abs
Applying machine translation to two-stage cross-language information retrieval
Atsushi Fujii | Tetsuya Ishikawa
Proceedings of the Fourth Conference of the Association for Machine Translation in the Americas: Technical Papers

Cross-language information retrieval (CLIR), where queries and documents are in different languages, needs a translation of queries and/or documents, so as to standardize both of them into a common representation. For this purpose, the use of machine translation is an effective approach. However, computational cost is prohibitive in translating large-scale document collections. To resolve this problem, we propose a two-stage CLIR method. First, we translate a given query into the document language, and retrieve a limited number of foreign documents. Second, we machine translate only those documents into the user language, and re-rank them based on the translation result. We also show the effectiveness of our method by way of experiments using Japanese queries and English technical documents.

pdf
A Novelty-based Evaluation Method for Information Retrieval
Atsushi Fujii | Tetsuya Ishikawa
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf
Utilizing the World Wide Web as an Encyclopedia: Extracting Term Descriptions from Semi-Structured Texts
Atsushi Fujii | Tetsuya Ishikawa
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics