EQUATE: A Benchmark Evaluation Framework for Quantitative Reasoning in Natural Language Inference

Abhilasha Ravichander; Aakanksha Naik; Carolyn Rose; Eduard Hovy

doi:10.18653/v1/K19-1033

EQUATE: A Benchmark Evaluation Framework for Quantitative Reasoning in Natural Language Inference

Abhilasha Ravichander, Aakanksha Naik, Carolyn Rose, Eduard Hovy

Abstract

Quantitative reasoning is a higher-order reasoning skill that any intelligent natural language understanding system can reasonably be expected to handle. We present EQUATE (Evaluating Quantitative Understanding Aptitude in Textual Entailment), a new framework for quantitative reasoning in textual entailment. We benchmark the performance of 9 published NLI models on EQUATE, and find that on average, state-of-the-art methods do not achieve an absolute improvement over a majority-class baseline, suggesting that they do not implicitly learn to reason with quantities. We establish a new baseline Q-REAS that manipulates quantities symbolically. In comparison to the best performing NLI model, it achieves success on numerical reasoning tests (+24.2 %), but has limited verbal reasoning capabilities (-8.1 %). We hope our evaluation framework will support the development of models of quantitative reasoning in language understanding.

Anthology ID:: K19-1033
Volume:: Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)
Month:: November
Year:: 2019
Address:: Hong Kong, China
Editors:: Mohit Bansal, Aline Villavicencio
Venue:: CoNLL
SIG:: SIGNLL
Publisher:: Association for Computational Linguistics
Note:
Pages:: 349–361
Language:
URL:: https://aclanthology.org/K19-1033
DOI:: 10.18653/v1/K19-1033
Bibkey:
Cite (ACL):: Abhilasha Ravichander, Aakanksha Naik, Carolyn Rose, and Eduard Hovy. 2019. EQUATE: A Benchmark Evaluation Framework for Quantitative Reasoning in Natural Language Inference. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pages 349–361, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):: EQUATE: A Benchmark Evaluation Framework for Quantitative Reasoning in Natural Language Inference (Ravichander et al., CoNLL 2019)
Copy Citation:
PDF:: https://preview.aclanthology.org/improve-issue-templates/K19-1033.pdf
Code: AbhilashaRavichander/EQUATE
Data: MultiNLI

PDF Search Code