Maya Ando


Resource of Wikipedias in 31 Languages Categorized into Fine-Grained Named Entities
Satoshi Sekine | Kouta Nakayama | Masako Nomoto | Maya Ando | Asuka Sumida | Koji Matsuda
Proceedings of the 29th International Conference on Computational Linguistics

This paper describes a resource of Wikipedias in 31 languages categorized into Extended Named Entity (ENE), which has 219 fine-grained NE categories. We first categorized 920 K Japanese Wikipedia pages according to the ENE scheme using machine learning, followed by manual validation. We then organized a shared task of Wikipedia categorization into 30 languages. The training data were provided by Japanese categorization and the language links, and the task was to categorize the Wikipedia pages into 30 languages, with no language links from Japanese Wikipedia (20M pages in total). Thirteen groups with 24 systems participated in the 2020 and 2021 tasks, sharing their outputs for resource-building. The Japanese categorization accuracy was 98.5%, and the best performance among the 30 languages ranges from 80 to 93 in F-measure. Using ensemble learning, we created outputs with an average F-measure of 86.8, which is 1.7 better than the best single systems. The total size of the resource is 32.5M pages, including the training data. We call this resource creation scheme “Resource by Collaborative Contribution (RbCC)”. We also constructed structuring tasks (attribute extraction and link prediction) using RbCC under our ongoing project, “SHINRA.”


Analysis of Travel Review Data from Reader’s Point of View
Maya Ando | Shun Ishizaki
Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis


Automatic Extraction of Hyponyms from Japanese Newspapers. Using Lexico-syntactic Patterns
Maya Ando | Satoshi Sekine | Shun Ishizaki
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)


Extraction of Associative Attributes from Nouns and Quantitative Expression of Prototype Concept
Maya Ando | Jun Okamoto | Shun Ishizaki
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)