What do we expect from Multiple-choice QA Systems?

Krunal Shah; Nitish Gupta; Dan Roth

doi:10.18653/v1/2020.findings-emnlp.317

What do we expect from Multiple-choice QA Systems?

Abstract

The recent success of machine learning systems on various QA datasets could be interpreted as a significant improvement in models’ language understanding abilities. However, using various perturbations, multiple recent works have shown that good performance on a dataset might not indicate performance that correlates well with human’s expectations from models that “understand” language. In this work we consider a top performing model on several Multiple Choice Question Answering (MCQA) datasets, and evaluate it against a set of expectations one might have from such a model, using a series of zero-information perturbations of the model’s inputs. Our results show that the model clearly falls short of our expectations, and motivates a modified training approach that forces the model to better attend to the inputs. We show that the new training paradigm leads to a model that performs on par with the original model while better satisfying our expectations.

Anthology ID:: 2020.findings-emnlp.317
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2020
Month:: November
Year:: 2020
Address:: Online
Editors:: Trevor Cohn, Yulan He, Yang Liu
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3547–3553
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2020.findings-emnlp.317/
DOI:: 10.18653/v1/2020.findings-emnlp.317
Bibkey:
Cite (ACL):: Krunal Shah, Nitish Gupta, and Dan Roth. 2020. What do we expect from Multiple-choice QA Systems?. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3547–3553, Online. Association for Computational Linguistics.
Cite (Informal):: What do we expect from Multiple-choice QA Systems? (Shah et al., Findings 2020)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2020.findings-emnlp.317.pdf
Video:: https://slideslive.com/38940132
Data: QASC

PDF Cite Search Video Fix data