KoRC: Knowledge Oriented Reading Comprehension Benchmark for Deep Text Understanding

Zijun Yao, Yantao Liu, Xin Lv, Shulin Cao, Jifan Yu, Juanzi Li, Lei Hou


Abstract
Deep text understanding, which requires the connections between a given document and prior knowledge beyond its text, has been highlighted by many benchmarks in recent years. However, these benchmarks have encountered two major limitations. On the one hand, most of them require human annotation of knowledge, which leads to limited knowledge coverage. On the other hand, they usually use choices or spans in the texts as the answers, which results in narrow answer space. To overcome these limitations, we build a new challenging benchmark named KoRC in this paper. Compared with previous benchmarks, KoRC has two advantages, i.e., broad knowledge coverage and flexible answer format. Specifically, we utilize massive knowledge bases to guide annotators or large language models (LLMs) to construct knowledgable questions. Moreover, we use labels in knowledge bases rather than spans or choices as the final answers. We test state-of-the-art models on KoRC and the experimental results show that the strongest baseline only achieves 68.3% and 30.0% F1 measure in the IID and OOD test set, respectively. These results indicate that deep text understanding is still an unsolved challenge. We will release our dataset and baseline methods upon acceptance.
Anthology ID:
2023.findings-acl.743
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11689–11707
Language:
URL:
https://aclanthology.org/2023.findings-acl.743
DOI:
10.18653/v1/2023.findings-acl.743
Bibkey:
Cite (ACL):
Zijun Yao, Yantao Liu, Xin Lv, Shulin Cao, Jifan Yu, Juanzi Li, and Lei Hou. 2023. KoRC: Knowledge Oriented Reading Comprehension Benchmark for Deep Text Understanding. In Findings of the Association for Computational Linguistics: ACL 2023, pages 11689–11707, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
KoRC: Knowledge Oriented Reading Comprehension Benchmark for Deep Text Understanding (Yao et al., Findings 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/dois-2013-emnlp/2023.findings-acl.743.pdf
Video:
 https://preview.aclanthology.org/dois-2013-emnlp/2023.findings-acl.743.mp4