A Japanese Chess Commentary Corpus
Shinsuke Mori, John Richardson, Atsushi Ushiku, Tetsuro Sasada, Hirotaka Kameko, Yoshimasa Tsuruoka
Abstract
In recent years there has been a surge of interest in the natural language prosessing related to the real world, such as symbol grounding, language generation, and nonlinguistic data search by natural language queries. In order to concentrate on language ambiguities, we propose to use a well-defined “real world,” that is game states. We built a corpus consisting of pairs of sentences and a game state. The game we focus on is shogi (Japanese chess). We collected 742,286 commentary sentences in Japanese. They are spontaneously generated contrary to natural language annotations in many image datasets provided by human workers on Amazon Mechanical Turk. We defined domain specific named entities and we segmented 2,508 sentences into words manually and annotated each word with a named entity tag. We describe a detailed definition of named entities and show some statistics of our game commentary corpus. We also show the results of the experiments of word segmentation and named entity recognition. The accuracies are as high as those on general domain texts indicating that we are ready to tackle various new problems related to the real world.- Anthology ID:
- L16-1225
- Volume:
- Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
- Month:
- May
- Year:
- 2016
- Address:
- Portorož, Slovenia
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 1415–1420
- Language:
- URL:
- https://aclanthology.org/L16-1225
- DOI:
- Cite (ACL):
- Shinsuke Mori, John Richardson, Atsushi Ushiku, Tetsuro Sasada, Hirotaka Kameko, and Yoshimasa Tsuruoka. 2016. A Japanese Chess Commentary Corpus. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 1415–1420, Portorož, Slovenia. European Language Resources Association (ELRA).
- Cite (Informal):
- A Japanese Chess Commentary Corpus (Mori et al., LREC 2016)
- PDF:
- https://preview.aclanthology.org/ml4al-ingestion/L16-1225.pdf