BonTen’ – Corpus Concordance System for ‘NINJAL Web Japanese Corpus’

Masayuki Asahara, Kazuya Kawahara, Yuya Takei, Hideto Masuoka, Yasuko Ohba, Yuki Torii, Toru Morii, Yuki Tanaka, Kikuo Maekawa, Sachi Kato, Hikari Konishi

[How to correct problems with metadata yourself]


Abstract
The National Institute for Japanese Language and Linguistics, Japan (NINJAL) has undertaken a corpus compilation project to construct a web corpus for linguistic research comprising ten billion words. The project is divided into four parts: page collection, linguistic analysis, development of the corpus concordance system, and preservation. This article presents the corpus concordance system named ‘BonTen’ which enables the ten-billion-scaled corpus to be queried by string, a sequence of morphological information or a subtree of the syntactic dependency structure.
Anthology ID:
C16-2006
Volume:
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations
Month:
December
Year:
2016
Address:
Osaka, Japan
Editor:
Hideo Watanabe
Venue:
COLING
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
25–29
Language:
URL:
https://aclanthology.org/C16-2006
DOI:
Bibkey:
Cite (ACL):
Masayuki Asahara, Kazuya Kawahara, Yuya Takei, Hideto Masuoka, Yasuko Ohba, Yuki Torii, Toru Morii, Yuki Tanaka, Kikuo Maekawa, Sachi Kato, and Hikari Konishi. 2016. ‘BonTen’ – Corpus Concordance System for ‘NINJAL Web Japanese Corpus’. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations, pages 25–29, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
‘BonTen’ – Corpus Concordance System for ‘NINJAL Web Japanese Corpus’ (Asahara et al., COLING 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/teach-a-man-to-fish/C16-2006.pdf