Abstract
Much of recent progress in NLU was shown to be due to models’ learning dataset-specific heuristics. We conduct a case study of generalization in NLI (from MNLI to the adversarially constructed HANS dataset) in a range of BERT-based architectures (adapters, Siamese Transformers, HEX debiasing), as well as with subsampling the data and increasing the model size. We report 2 successful and 3 unsuccessful strategies, all providing insights into how Transformer-based models learn to generalize.- Anthology ID:
- 2021.insights-1.18
- Volume:
- Proceedings of the Second Workshop on Insights from Negative Results in NLP
- Month:
- November
- Year:
- 2021
- Address:
- Online and Punta Cana, Dominican Republic
- Venue:
- insights
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 125–135
- Language:
- URL:
- https://aclanthology.org/2021.insights-1.18
- DOI:
- 10.18653/v1/2021.insights-1.18
- Cite (ACL):
- Prajjwal Bhargava, Aleksandr Drozd, and Anna Rogers. 2021. Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics. In Proceedings of the Second Workshop on Insights from Negative Results in NLP, pages 125–135, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics (Bhargava et al., insights 2021)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2021.insights-1.18.pdf
- Code
- prajjwal1/generalize_lm_nli
- Data
- MultiNLI