Abstract
Natural language inference (NLI) datasets (e.g., MultiNLI) were collected by soliciting hypotheses for a given premise from annotators. Such data collection led to annotation artifacts: systems can identify the premise-hypothesis relationship without observing the premise (e.g., negation in hypothesis being indicative of contradiction). We address this problem by recasting the CommitmentBank for NLI, which contains items involving reasoning over the extent to which a speaker is committed to complements of clause-embedding verbs under entailment-canceling environments (conditional, negation, modal and question). Instead of being constructed to stand in certain relationships with the premise, hypotheses in the recast CommitmentBank are the complements of the clause-embedding verb in each premise, leading to no annotation artifacts in the hypothesis. A state-of-the-art BERT-based model performs well on the CommitmentBank with 85% F1. However analysis of model behavior shows that the BERT models still do not capture the full complexity of pragmatic reasoning, nor encode some of the linguistic generalizations, highlighting room for improvement.- Anthology ID:
- D19-1630
- Volume:
- Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
- Month:
- November
- Year:
- 2019
- Address:
- Hong Kong, China
- Editors:
- Kentaro Inui, Jing Jiang, Vincent Ng, Xiaojun Wan
- Venues:
- EMNLP | IJCNLP
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 6086–6091
- Language:
- URL:
- https://aclanthology.org/D19-1630
- DOI:
- 10.18653/v1/D19-1630
- Cite (ACL):
- Nanjiang Jiang and Marie-Catherine de Marneffe. 2019. Evaluating BERT for natural language inference: A case study on the CommitmentBank. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 6086–6091, Hong Kong, China. Association for Computational Linguistics.
- Cite (Informal):
- Evaluating BERT for natural language inference: A case study on the CommitmentBank (Jiang & de Marneffe, EMNLP-IJCNLP 2019)
- PDF:
- https://preview.aclanthology.org/improve-issue-templates/D19-1630.pdf