CODAH: An Adversarially-Authored Question Answering Dataset for Common Sense
Michael Chen, Mike D’Arcy, Alisa Liu, Jared Fernandez, Doug Downey
Abstract
Commonsense reasoning is a critical AI capability, but it is difficult to construct challenging datasets that test common sense. Recent neural question answering systems, based on large pre-trained models of language, have already achieved near-human-level performance on commonsense knowledge benchmarks. These systems do not possess human-level common sense, but are able to exploit limitations of the datasets to achieve human-level scores. We introduce the CODAH dataset, an adversarially-constructed evaluation dataset for testing common sense. CODAH forms a challenging extension to the recently-proposed SWAG dataset, which tests commonsense knowledge using sentence-completion questions that describe situations observed in video. To produce a more difficult dataset, we introduce a novel procedure for question acquisition in which workers author questions designed to target weaknesses of state-of-the-art neural question answering systems. Workers are rewarded for submissions that models fail to answer correctly both before and after fine-tuning (in cross-validation). We create 2.8k questions via this procedure and evaluate the performance of multiple state-of-the-art question answering systems on our dataset. We observe a significant gap between human performance, which is 95.3%, and the performance of the best baseline accuracy of 65.3% by the OpenAI GPT model.- Anthology ID:
- W19-2008
- Volume:
- Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP
- Month:
- June
- Year:
- 2019
- Address:
- Minneapolis, USA
- Editors:
- Anna Rogers, Aleksandr Drozd, Anna Rumshisky, Yoav Goldberg
- Venue:
- RepEval
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 63–69
- Language:
- URL:
- https://preview.aclanthology.org/icon-24-ingestion/W19-2008/
- DOI:
- 10.18653/v1/W19-2008
- Cite (ACL):
- Michael Chen, Mike D’Arcy, Alisa Liu, Jared Fernandez, and Doug Downey. 2019. CODAH: An Adversarially-Authored Question Answering Dataset for Common Sense. In Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP, pages 63–69, Minneapolis, USA. Association for Computational Linguistics.
- Cite (Informal):
- CODAH: An Adversarially-Authored Question Answering Dataset for Common Sense (Chen et al., RepEval 2019)
- PDF:
- https://preview.aclanthology.org/icon-24-ingestion/W19-2008.pdf
- Code
- Websail-NU/CODAH
- Data
- CODAH, SNLI, SQuAD, SWAG