The Sensitivity of Language Models and Humans to Winograd Schema Perturbations
Mostafa Abdou, Vinit Ravishankar, Maria Barrett, Yonatan Belinkov, Desmond Elliott, Anders Søgaard
Abstract
Large-scale pretrained language models are the major driving force behind recent improvements in perfromance on the Winograd Schema Challenge, a widely employed test of commonsense reasoning ability. We show, however, with a new diagnostic dataset, that these models are sensitive to linguistic perturbations of the Winograd examples that minimally affect human understanding. Our results highlight interesting differences between humans and language models: language models are more sensitive to number or gender alternations and synonym replacements than humans, and humans are more stable and consistent in their predictions, maintain a much higher absolute performance, and perform better on non-associative instances than associative ones.- Anthology ID:
- 2020.acl-main.679
- Volume:
- Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
- Month:
- July
- Year:
- 2020
- Address:
- Online
- Editors:
- Dan Jurafsky, Joyce Chai, Natalie Schluter, Joel Tetreault
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 7590–7604
- Language:
- URL:
- https://aclanthology.org/2020.acl-main.679
- DOI:
- 10.18653/v1/2020.acl-main.679
- Cite (ACL):
- Mostafa Abdou, Vinit Ravishankar, Maria Barrett, Yonatan Belinkov, Desmond Elliott, and Anders Søgaard. 2020. The Sensitivity of Language Models and Humans to Winograd Schema Perturbations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7590–7604, Online. Association for Computational Linguistics.
- Cite (Informal):
- The Sensitivity of Language Models and Humans to Winograd Schema Perturbations (Abdou et al., ACL 2020)
- PDF:
- https://preview.aclanthology.org/landing_page/2020.acl-main.679.pdf
- Code
- mhany90/enhanced_wsc + additional community code
- Data
- GLUE, WSC, WinoGrande