2023
pdf
abs
UD_Japanese-CEJC: Dependency Relation Annotation on Corpus of Everyday Japanese Conversation
Mai Omura
|
Hiroshi Matsuda
|
Masayuki Asahara
|
Aya Wakasa
Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue
In this study, we have developed Universal Dependencies (UD) resources for spoken Japanese in the Corpus of Everyday Japanese Conversation (CEJC). The CEJC is a large corpus of spoken language that encompasses various everyday conversations in Japanese, and includes word delimitation and part-of-speech annotation. We have newly annotated Long Word Unit delimitation and Bunsetsu (Japanese phrase)-based dependencies, including Bunsetsu boundaries, for CEJC. The UD of Japanese resources was constructed in accordance with hand-maintained conversion rules from the CEJC with two types of word delimitation, part-of-speech tags and Bunsetsu-based syntactic dependency relations. Furthermore, we examined various issues pertaining to the construction of UD in the CEJC by comparing it with the written Japanese corpus and evaluating UD parsing accuracy.
pdf
Spatial Information Annotation Based on the Double Cross Model
Yoshiko Kawabata
|
Mai Omura
|
Masayuki Asahara
|
Johane Takeuchi
Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation
2021
pdf
Word Delimitation Issues in UD Japanese
Mai Omura
|
Aya Wakasa
|
Masayuki Asahara
Proceedings of the Fifth Workshop on Universal Dependencies (UDW, SyntaxFest 2021)
2018
pdf
abs
UD-Japanese BCCWJ: Universal Dependencies Annotation for the Balanced Corpus of Contemporary Written Japanese
Mai Omura
|
Masayuki Asahara
Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)
In this paper, we describe a corpus UD Japanese-BCCWJ that was created by converting the Balanced Corpus of Contemporary Written Japanese (BCCWJ), a Japanese language corpus, to adhere to the UD annotation schema. The BCCWJ already assigns dependency information at the level of the bunsetsu (a Japanese syntactic unit comparable to the phrase). We developed a program to convert the BCCWJ to UD based on this dependency structure, and this corpus is the result of completely automatic conversion using the program. UD Japanese-BCCWJ is the largest-scale UD Japanese corpus and the second-largest of all UD corpora, including 1,980 documents, 57,109 sentences, and 1,273k words across six distinct domains.
pdf
Universal Dependencies Version 2 for Japanese
Masayuki Asahara
|
Hiroshi Kanayama
|
Takaaki Tanaka
|
Yusuke Miyao
|
Sumire Uematsu
|
Shinsuke Mori
|
Yuji Matsumoto
|
Mai Omura
|
Yugo Murawaki
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)