Collecting Diverse Natural Language Inference Problems for Sentence Representation Evaluation
Adam Poliak, Aparajita Haldar, Rachel Rudinger, J. Edward Hu, Ellie Pavlick, Aaron Steven White, Benjamin Van Durme
Abstract
We present a large-scale collection of diverse natural language inference (NLI) datasets that help provide insight into how well a sentence representation captures distinct types of reasoning. The collection results from recasting 13 existing datasets from 7 semantic phenomena into a common NLI structure, resulting in over half a million labeled context-hypothesis pairs in total. We refer to our collection as the DNC: Diverse Natural Language Inference Collection. The DNC is available online at https://www.decomp.net, and will grow over time as additional resources are recast and added from novel sources.- Anthology ID:
- D18-1007
- Volume:
- Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
- Month:
- October-November
- Year:
- 2018
- Address:
- Brussels, Belgium
- Editors:
- Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
- Venue:
- EMNLP
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 67–81
- Language:
- URL:
- https://aclanthology.org/D18-1007
- DOI:
- 10.18653/v1/D18-1007
- Cite (ACL):
- Adam Poliak, Aparajita Haldar, Rachel Rudinger, J. Edward Hu, Ellie Pavlick, Aaron Steven White, and Benjamin Van Durme. 2018. Collecting Diverse Natural Language Inference Problems for Sentence Representation Evaluation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 67–81, Brussels, Belgium. Association for Computational Linguistics.
- Cite (Informal):
- Collecting Diverse Natural Language Inference Problems for Sentence Representation Evaluation (Poliak et al., EMNLP 2018)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/D18-1007.pdf