Evaluation Metrics for Machine Reading Comprehension: Prerequisite Skills and Readability

Saku Sugawara; Yusuke Kido; Hikaru Yokono; Akiko Aizawa

doi:10.18653/v1/P17-1075

Evaluation Metrics for Machine Reading Comprehension: Prerequisite Skills and Readability

Saku Sugawara, Yusuke Kido, Hikaru Yokono, Akiko Aizawa

Abstract

Knowing the quality of reading comprehension (RC) datasets is important for the development of natural-language understanding systems. In this study, two classes of metrics were adopted for evaluating RC datasets: prerequisite skills and readability. We applied these classes to six existing datasets, including MCTest and SQuAD, and highlighted the characteristics of the datasets according to each metric and the correlation between the two classes. Our dataset analysis suggests that the readability of RC datasets does not directly affect the question difficulty and that it is possible to create an RC dataset that is easy to read but difficult to answer.

Anthology ID:: P17-1075
Volume:: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2017
Address:: Vancouver, Canada
Editors:: Regina Barzilay, Min-Yen Kan
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 806–817
Language:
URL:: https://aclanthology.org/P17-1075
DOI:: 10.18653/v1/P17-1075
Bibkey:
Cite (ACL):: Saku Sugawara, Yusuke Kido, Hikaru Yokono, and Akiko Aizawa. 2017. Evaluation Metrics for Machine Reading Comprehension: Prerequisite Skills and Readability. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 806–817, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):: Evaluation Metrics for Machine Reading Comprehension: Prerequisite Skills and Readability (Sugawara et al., ACL 2017)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-2023-videos/P17-1075.pdf
Video:: https://vimeo.com/234958313
Data: MCTest, MS MARCO, NewsQA, SQuAD, Who-did-What

PDF Search Video