Mining Knowledge for Natural Language Inference from Wikipedia Categories

Mingda Chen, Zewei Chu, Karl Stratos, Kevin Gimpel


Abstract
Accurate lexical entailment (LE) and natural language inference (NLI) often require large quantities of costly annotations. To alleviate the need for labeled data, we introduce WikiNLI: a resource for improving model performance on NLI and LE tasks. It contains 428,899 pairs of phrases constructed from naturally annotated category hierarchies in Wikipedia. We show that we can improve strong baselines such as BERT and RoBERTa by pretraining them on WikiNLI and transferring the models on downstream tasks. We conduct systematic comparisons with phrases extracted from other knowledge bases such as WordNet and Wikidata to find that pretraining on WikiNLI gives the best performance. In addition, we construct WikiNLI in other languages, and show that pretraining on them improves performance on NLI tasks of corresponding languages.
Anthology ID:
2020.findings-emnlp.313
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2020
Month:
November
Year:
2020
Address:
Online
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3500–3511
Language:
URL:
https://aclanthology.org/2020.findings-emnlp.313
DOI:
10.18653/v1/2020.findings-emnlp.313
Bibkey:
Cite (ACL):
Mingda Chen, Zewei Chu, Karl Stratos, and Kevin Gimpel. 2020. Mining Knowledge for Natural Language Inference from Wikipedia Categories. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3500–3511, Online. Association for Computational Linguistics.
Cite (Informal):
Mining Knowledge for Natural Language Inference from Wikipedia Categories (Chen et al., Findings 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/2020.findings-emnlp.313.pdf
Optional supplementary material:
 2020.findings-emnlp.313.OptionalSupplementaryMaterial.zip
Code
 ZeweiChu/WikiNLI
Data
ANLIGLUEMultiNLIXNLI