Teruaki Oka


2020

pdf bib
KOTONOHA: A Corpus Concordance System for Skewer-Searching NINJAL Corpora
Teruaki Oka | Yuichi Ishimoto | Yutaka Yagi | Takenori Nakamura | Masayuki Asahara | Kikuo Maekawa | Toshinobu Ogiso | Hanae Koiso | Kumiko Sakoda | Nobuko Kibe
Proceedings of the 12th Language Resources and Evaluation Conference

The National Institute for Japanese Language and Linguistics, Japan (NINJAL, Japan), has developed several types of corpora. For each corpus NINJAL provided an online search environment, ‘Chunagon’, which is a morphological-information-annotation-based concordance system made publicly available in 2011. NINJAL has now provided a skewer-search system ‘Kotonoha’ based on the ‘Chunagon’ systems. This system enables querying of multiple corpora by certain categories, such as register type and period.

2016

pdf bib
Original-Transcribed Text Alignment for Manyosyu Written by Old Japanese Language
Teruaki Oka | Tomoaki Kono
Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)

We are constructing an annotated diachronic corpora of the Japanese language. In part of thiswork, we construct a corpus of Manyosyu, which is an old Japanese poetry anthology. In thispaper, we describe how to align the transcribed text and its original text semiautomatically to beable to cross-reference them in our Manyosyu corpus. Although we align the original charactersto the transcribed words manually, we preliminarily align the transcribed and original charactersby using an unsupervised automatic alignment technique of statistical machine translation toalleviate the work. We found that automatic alignment achieves an F1-measure of 0.83; thus, each poem has 1–2 alignment errors. However, finding these errors and modifying them are less workintensiveand more efficient than fully manual annotation. The alignment probabilities can beutilized in this modification. Moreover, we found that we can locate the uncertain transcriptionsin our corpus and compare them to other transcriptions, by using the alignment probabilities.

2011

pdf bib
Automatic Labeling of Voiced Consonants for Morphological Analysis of Modern Japanese Literature
Teruaki Oka | Mamoru Komachi | Toshinobu Ogiso | Yuji Matsumoto
Proceedings of 5th International Joint Conference on Natural Language Processing