Toshio Morita


2018

pdf
Cooperating Tools for MWE Lexicon Management and Corpus Annotation
Yuji Matsumoto | Akihiko Kato | Hiroyuki Shindo | Toshio Morita
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)

We present tools for lexicon and corpus management that offer cooperating functionality in corpus annotation. The former, named Cradle, stores a set of words and expressions where multi-word expressions are defined with their own part-of-speech information and internal syntactic structures. The latter, named ChaKi, manages text corpora with part-of-speech (POS) and syntactic dependency structure annotations. Those two tools cooperate so that the words and multi-word expressions stored in Cradle are directly referred to by ChaKi in conducting corpus annotation, and the words and expressions annotated in ChaKi can be output as a list of lexical entities that are to be stored in Cradle.

2016

pdf
Demonstration of ChaKi.NET – beyond the corpus search system
Masayuki Asahara | Yuji Matsumoto | Toshio Morita
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

ChaKi.NET is a corpus management system for dependency structure annotated corpora. After more than 10 years of continuous development, the system is now usable not only for corpus search, but also for visualization, annotation, labelling, and formatting for statistical analysis. This paper describes the various functions included in the current ChaKi.NET system.

2006

pdf
An Annotated Corpus Management Tool: ChaKi
Yuji Matsumoto | Masayuki Asahara | Kiyota Hashimoto | Yukio Tono | Akira Ohtani | Toshio Morita
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Large scale annotated corpora are very important not only inlinguistic research but also in practical natural language processingtasks since a number of practical tools such as Part-of-speech (POS) taggers and syntactic parsers are now corpus-based or machine learning-based systems which require some amount of accurately annotated corpora. This article presents an annotated corpus management tool that provides various functions that include flexible search, statistic calculation, and error correction for linguistically annotated corpora. The target of annotation covers POS tags, base phrase chunks and syntactic dependency structures. This tool aims at helping development of consistent construction of lexicon and annotated corpora to be used by researchers both in linguists and language processing communities.