Abstract
To produce a domain-agnostic question answering model for the Machine Reading Question Answering (MRQA) 2019 Shared Task, we investigate the relative benefits of large pre-trained language models, various data sampling strategies, as well as query and context paraphrases generated by back-translation. We find a simple negative sampling technique to be particularly effective, even though it is typically used for datasets that include unanswerable questions, such as SQuAD 2.0. When applied in conjunction with per-domain sampling, our XLNet (Yang et al., 2019)-based submission achieved the second best Exact Match and F1 in the MRQA leaderboard competition.- Anthology ID:
- D19-5829
- Volume:
- Proceedings of the 2nd Workshop on Machine Reading for Question Answering
- Month:
- November
- Year:
- 2019
- Address:
- Hong Kong, China
- Editors:
- Adam Fisch, Alon Talmor, Robin Jia, Minjoon Seo, Eunsol Choi, Danqi Chen
- Venue:
- WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 220–227
- Language:
- URL:
- https://aclanthology.org/D19-5829
- DOI:
- 10.18653/v1/D19-5829
- Cite (ACL):
- Shayne Longpre, Yi Lu, Zhucheng Tu, and Chris DuBois. 2019. An Exploration of Data Augmentation and Sampling Techniques for Domain-Agnostic Question Answering. In Proceedings of the 2nd Workshop on Machine Reading for Question Answering, pages 220–227, Hong Kong, China. Association for Computational Linguistics.
- Cite (Informal):
- An Exploration of Data Augmentation and Sampling Techniques for Domain-Agnostic Question Answering (Longpre et al., 2019)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/D19-5829.pdf
- Data
- BioASQ, DROP, DuoRC, HotpotQA, Natural Questions, NewsQA, RACE, SQuAD, SearchQA, TriviaQA