Akihiko Kato


2018

pdf
Cooperating Tools for MWE Lexicon Management and Corpus Annotation
Yuji Matsumoto | Akihiko Kato | Hiroyuki Shindo | Toshio Morita
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)

We present tools for lexicon and corpus management that offer cooperating functionality in corpus annotation. The former, named Cradle, stores a set of words and expressions where multi-word expressions are defined with their own part-of-speech information and internal syntactic structures. The latter, named ChaKi, manages text corpora with part-of-speech (POS) and syntactic dependency structure annotations. Those two tools cooperate so that the words and multi-word expressions stored in Cradle are directly referred to by ChaKi in conducting corpus annotation, and the words and expressions annotated in ChaKi can be output as a list of lexical entities that are to be stored in Cradle.

pdf
Construction of Large-scale English Verbal Multiword Expression Annotated Corpus
Akihiko Kato | Hiroyuki Shindo | Yuji Matsumoto
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf
English Multiword Expression-aware Dependency Parsing Including Named Entities
Akihiko Kato | Hiroyuki Shindo | Yuji Matsumoto
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Because syntactic structures and spans of multiword expressions (MWEs) are independently annotated in many English syntactic corpora, they are generally inconsistent with respect to one another, which is harmful to the implementation of an aggregate system. In this work, we construct a corpus that ensures consistency between dependency structures and MWEs, including named entities. Further, we explore models that predict both MWE-spans and an MWE-aware dependency structure. Experimental results show that our joint model using additional MWE-span features achieves an MWE recognition improvement of 1.35 points over a pipeline model.

2016

pdf
Construction of an English Dependency Corpus incorporating Compound Function Words
Akihiko Kato | Hiroyuki Shindo | Yuji Matsumoto
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The recognition of multiword expressions (MWEs) in a sentence is important for such linguistic analyses as syntactic and semantic parsing, because it is known that combining an MWE into a single token improves accuracy for various NLP tasks, such as dependency parsing and constituency parsing. However, MWEs are not annotated in Penn Treebank. Furthermore, when converting word-based dependency to MWE-aware dependency directly, one could combine nodes in an MWE into a single node. Nevertheless, this method often leads to the following problem: A node derived from an MWE could have multiple heads and the whole dependency structure including MWE might be cyclic. Therefore we converted a phrase structure to a dependency structure after establishing an MWE as a single subtree. This approach can avoid an occurrence of multiple heads and/or cycles. In this way, we constructed an English dependency corpus taking into account compound function words, which are one type of MWEs that serve as functional expressions. In addition, we report experimental results of dependency parsing using a constructed corpus.

pdf
Identification of Flexible Multiword Expressions with the Help of Dependency Structure Annotation
Ayaka Morimoto | Akifumi Yoshimoto | Akihiko Kato | Hiroyuki Shindo | Yuji Matsumoto
Proceedings of the Workshop on Grammar and Lexicon: interactions and interfaces (GramLex)

This paper presents our ongoing work on compilation of English multi-word expression (MWE) lexicon. We are especially interested in collecting flexible MWEs, in which some other components can intervene the expression such as “a number of” vs “a large number of” where a modifier of “number” can be placed in the expression and inherit the original meaning. We fiest collect possible candidates of flexible English MWEs from the web, and annotate all of their occurrences in the Wall Street Journal portion of Ontonotes corpus. We make use of word dependency strcuture information of the sentences converted from the phrase structure annotation. This process enables semi-automatic annotation of MWEs in the corpus and simultanaously produces the internal and external dependency representation of flexible MWEs.