Abstract
Machine reading comprehension (MRC) is a challenging NLP task for it requires to carefully deal with all linguistic granularities from word, sentence to passage. For extractive MRC, the answer span has been shown mostly determined by key evidence linguistic units, in which it is a sentence in most cases. However, we recently discovered that sentences may not be clearly defined in many languages to different extents, so that this causes so-called location unit ambiguity problem and as a result makes it difficult for the model to determine which sentence exactly contains the answer span when sentence itself has not been clearly defined at all. Taking Chinese language as a case study, we explain and analyze such a linguistic phenomenon and correspondingly propose a reader with Explicit Span-Sentence Predication to alleviate such a problem. Our proposed reader eventually helps achieve a new state-of-the-art on Chinese MRC benchmark and shows great potential in dealing with other languages.- Anthology ID:
- 2021.findings-emnlp.202
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2021
- Month:
- November
- Year:
- 2021
- Address:
- Punta Cana, Dominican Republic
- Editors:
- Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
- Venue:
- Findings
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2348–2359
- Language:
- URL:
- https://aclanthology.org/2021.findings-emnlp.202
- DOI:
- 10.18653/v1/2021.findings-emnlp.202
- Cite (ACL):
- Jiawei Wang, Hai Zhao, Yinggong Zhao, and Libin Shen. 2021. What If Sentence-hood is Hard to Define: A Case Study in Chinese Reading Comprehension. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2348–2359, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- What If Sentence-hood is Hard to Define: A Case Study in Chinese Reading Comprehension (Wang et al., Findings 2021)
- PDF:
- https://preview.aclanthology.org/ingest-2024-clasp/2021.findings-emnlp.202.pdf
- Data
- CJRC, CMRC, CMRC 2018, DRCD, SQuAD