ChID: A Large-scale Chinese IDiom Dataset for Cloze Test

Chujie Zheng; Minlie Huang; Aixin Sun

doi:10.18653/v1/P19-1075

ChID: A Large-scale Chinese IDiom Dataset for Cloze Test

Abstract

Cloze-style reading comprehension in Chinese is still limited due to the lack of various corpora. In this paper we propose a large-scale Chinese cloze test dataset ChID, which studies the comprehension of idiom, a unique language phenomenon in Chinese. In this corpus, the idioms in a passage are replaced by blank symbols and the correct answer needs to be chosen from well-designed candidate idioms. We carefully study how the design of candidate idioms and the representation of idioms affect the performance of state-of-the-art models. Results show that the machine accuracy is substantially worse than that of human, indicating a large space for further research.

Anthology ID:: P19-1075
Volume:: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Month:: July
Year:: 2019
Address:: Florence, Italy
Editors:: Anna Korhonen, David Traum, Lluís Màrquez
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 778–787
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/P19-1075/
DOI:: 10.18653/v1/P19-1075
Bibkey:
Cite (ACL):: Chujie Zheng, Minlie Huang, and Aixin Sun. 2019. ChID: A Large-scale Chinese IDiom Dataset for Cloze Test. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 778–787, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):: ChID: A Large-scale Chinese IDiom Dataset for Cloze Test (Zheng et al., ACL 2019)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/P19-1075.pdf
Code: chujiezheng/ChID-Dataset
Data: ChID

PDF Cite Search Code Fix data