Sachi Kato


CHJ-WLSP: Annotation of ‘Word List by Semantic Principles’ Labels for the Corpus of Historical Japanese
Masayuki Asahara | Nao Ikegami | Tai Suzuki | Taro Ichimura | Asuko Kondo | Sachi Kato | Makoto Yamazaki
Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages

This article presents a word-sense annotation for the Corpus of Historical Japanese: a mashed-up Japanese lexicon based on the ‘Word List by Semantic Principles’ (WLSP). The WLSP is a large-scale Japanese thesaurus that includes 98,241 entries with syntactic and hierarchical semantic categories. The historical WLSP is also compiled for the words in ancient Japanese. We utilized a morpheme-word sense alignment table to extract all possible word sense candidates for each word appearing in the target corpus. Then, we manually disambiguated the word senses for 647,751 words in the texts from the 10th century to 1910.


The Annotation of Antonym Information in the ‘Word List by Semantic Principles’
Sachi Kato | Masayuki Asahara | Nanami Moriyama | Makoto Yamazaki Asami Ogiwara
Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation


Annotation of ‘Word List by Semantic Principles’ Labels for the Balanced Corpus of Contemporary Written Japanese
Sachi Kato | Masayuki Asahara | Makoto Yamazaki
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation


Between Reading Time and Syntactic/Semantic Categories
Masayuki Asahara | Sachi Kato
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

This article presents a contrastive analysis between reading time and syntactic/semantic categories in Japanese. We overlaid the reading time annotation of BCCWJ-EyeTrack and a syntactic/semantic category information annotation on the ‘Balanced Corpus of Contemporary Written Japanese’. Statistical analysis based on a mixed linear model showed that verbal phrases tend to have shorter reading times than adjectives, adverbial phrases, or nominal phrases. The results suggest that the preceding phrases associated with the presenting phrases promote the reading process to shorten the gazing time.


BonTen’ – Corpus Concordance System for ‘NINJAL Web Japanese Corpus’
Masayuki Asahara | Kazuya Kawahara | Yuya Takei | Hideto Masuoka | Yasuko Ohba | Yuki Torii | Toru Morii | Yuki Tanaka | Kikuo Maekawa | Sachi Kato | Hikari Konishi
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

The National Institute for Japanese Language and Linguistics, Japan (NINJAL) has undertaken a corpus compilation project to construct a web corpus for linguistic research comprising ten billion words. The project is divided into four parts: page collection, linguistic analysis, development of the corpus concordance system, and preservation. This article presents the corpus concordance system named ‘BonTen’ which enables the ten-billion-scaled corpus to be queried by string, a sequence of morphological information or a subtree of the syntactic dependency structure.


pdf bib
BCCWJ-TimeBank: Temporal and Event Information Annotation on Japanese Text
Masayuki Asahara | Sachi Kato | Hikari Konishi | Mizuho Imada | Kikuo Maekawa
International Journal of Computational Linguistics & Chinese Language Processing, Volume 19, Number 3, September 2014