Reading-Time Annotations for “Balanced Corpus of Contemporary Written Japanese”

Masayuki Asahara, Hajime Ono, Edson T. Miyamoto


Abstract
The Dundee Eyetracking Corpus contains eyetracking data collected while native speakers of English and French read newspaper editorial articles. Similar resources for other languages are still rare, especially for languages in which words are not overtly delimited with spaces. This is a report on a project to build an eyetracking corpus for Japanese. Measurements were collected while 24 native speakers of Japanese read excerpts from the Balanced Corpus of Contemporary Written Japanese Texts were presented with or without segmentation (i.e. with or without space at the boundaries between bunsetsu segmentations) and with two types of methodologies (eyetracking and self-paced reading presentation). Readers’ background information including vocabulary-size estimation and Japanese reading-span score were also collected. As an example of the possible uses for the corpus, we also report analyses investigating the phenomena of anti-locality.
Anthology ID:
C16-1066
Volume:
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Month:
December
Year:
2016
Address:
Osaka, Japan
Venue:
COLING
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
684–694
Language:
URL:
https://aclanthology.org/C16-1066
DOI:
Bibkey:
Cite (ACL):
Masayuki Asahara, Hajime Ono, and Edson T. Miyamoto. 2016. Reading-Time Annotations for “Balanced Corpus of Contemporary Written Japanese”. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 684–694, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Reading-Time Annotations for “Balanced Corpus of Contemporary Written Japanese” (Asahara et al., COLING 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/C16-1066.pdf