IDK-MRC: Unanswerable Questions for Indonesian Machine Reading Comprehension

Rifki Afina Putri, Alice Oh


Abstract
Machine Reading Comprehension (MRC) has become one of the essential tasks in Natural Language Understanding (NLU) as it is often included in several NLU benchmarks (Liang et al., 2020; Wilie et al., 2020). However, most MRC datasets only have answerable question type, overlooking the importance of unanswerable questions. MRC models trained only on answerable questions will select the span that is most likely to be the answer, even when the answer does not actually exist in the given passage (Rajpurkar et al., 2018). This problem especially remains in medium- to low-resource languages like Indonesian. Existing Indonesian MRC datasets (Purwarianti et al., 2007; Clark et al., 2020) are still inadequate because of the small size and limited question types, i.e., they only cover answerable questions. To fill this gap, we build a new Indonesian MRC dataset called I(n)don’tKnow- MRC (IDK-MRC) by combining the automatic and manual unanswerable question generation to minimize the cost of manual dataset construction while maintaining the dataset quality. Combined with the existing answerable questions, IDK-MRC consists of more than 10K questions in total. Our analysis shows that our dataset significantly improves the performance of Indonesian MRC models, showing a large improvement for unanswerable questions.
Anthology ID:
2022.emnlp-main.465
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6918–6933
Language:
URL:
https://aclanthology.org/2022.emnlp-main.465
DOI:
10.18653/v1/2022.emnlp-main.465
Bibkey:
Cite (ACL):
Rifki Afina Putri and Alice Oh. 2022. IDK-MRC: Unanswerable Questions for Indonesian Machine Reading Comprehension. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6918–6933, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
IDK-MRC: Unanswerable Questions for Indonesian Machine Reading Comprehension (Putri & Oh, EMNLP 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/2022.emnlp-main.465.pdf
Dataset:
 2022.emnlp-main.465.dataset.zip