Kazuya Kawahara
2016
‘BonTen’ – Corpus Concordance System for ‘NINJAL Web Japanese Corpus’
Masayuki Asahara
|
Kazuya Kawahara
|
Yuya Takei
|
Hideto Masuoka
|
Yasuko Ohba
|
Yuki Torii
|
Toru Morii
|
Yuki Tanaka
|
Kikuo Maekawa
|
Sachi Kato
|
Hikari Konishi
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations
The National Institute for Japanese Language and Linguistics, Japan (NINJAL) has undertaken a corpus compilation project to construct a web corpus for linguistic research comprising ten billion words. The project is divided into four parts: page collection, linguistic analysis, development of the corpus concordance system, and preservation. This article presents the corpus concordance system named ‘BonTen’ which enables the ten-billion-scaled corpus to be queried by string, a sequence of morphological information or a subtree of the syntactic dependency structure.
Search
Co-authors
- Masayuki Asahara 1
- Yuka Takei 1
- Hideto Masuoka 1
- Yasuko Ohba 1
- Yuki Torii 1
- show all...