Automatic Assessment of Japanese Text Readability Based on a Textbook Corpus

Satoshi Sato; Suguru Matsuyoshi; Yohsuke Kondoh

Automatic Assessment of Japanese Text Readability Based on a Textbook Corpus

Satoshi Sato, Suguru Matsuyoshi, Yohsuke Kondoh

Abstract

This paper describes a method of readability measurement of Japanese texts based on a newly compiled textbook corpus. The textbook corpus consists of 1,478 sample passages extracted from 127 textbooks of elementary school, junior high school, high school, and university; it is divided into thirteen grade levels and the total size is about a million characters. For a given text passage, the readability measurement method determines the grade level to which the passage is the most similar by using character-unigram models, which are constructed from the textbook corpus. Because this method does not require sentence-boundary analysis and word-boundary analysis, it is applicable to texts that include incomplete sentences and non-regular text fragments. The performance of this method, which is measured by the correlation coefficient, is considerably high (R > 0.9); in case that the length of a text passage is limited in 25 characters, the correlation coefficient is still high (R = 0.83).

Anthology ID:: L08-1230
Volume:: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Month:: May
Year:: 2008
Address:: Marrakech, Morocco
Editors:: Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:: http://www.lrec-conf.org/proceedings/lrec2008/pdf/165_paper.pdf
DOI:
Bibkey:
Cite (ACL):: Satoshi Sato, Suguru Matsuyoshi, and Yohsuke Kondoh. 2008. Automatic Assessment of Japanese Text Readability Based on a Textbook Corpus. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Resources Association (ELRA).
Cite (Informal):: Automatic Assessment of Japanese Text Readability Based on a Textbook Corpus (Sato et al., LREC 2008)
Copy Citation:
PDF:: http://www.lrec-conf.org/proceedings/lrec2008/pdf/165_paper.pdf

PDF Cite Search Fix data