A MISMATCHED Benchmark for Scientific Natural Language Inference

Firoz Shaik; Mobashir Sadat; Nikita Gautam; Doina Caragea; Cornelia Caragea

A MISMATCHED Benchmark for Scientific Natural Language Inference

Firoz Shaik, Mobashir Sadat, Nikita Gautam, Doina Caragea, Cornelia Caragea

Abstract

Scientific Natural Language Inference (NLI) is the task of predicting the semantic relation between a pair of sentences extracted from research articles. Existing datasets for this task are derived from various computer science (CS) domains, whereas non-CS domains are completely ignored. In this paper, we introduce a novel evaluation benchmark for scientific NLI, called MisMatched. The new MisMatched benchmark covers three non-CS domains–Psychology, Engineering, and Public Health, and contains 2,700 human annotated sentence pairs. We establish strong baselines on MisMatched using both Pre-trained Small Language Models (SLMs) and Large Language Models (LLMs). Our best performing baseline shows a Macro F1 of only 78.17% illustrating the substantial headroom for future improvements. In addition to introducing the MisMatched benchmark, we show that incorporating sentence pairs having an implicit scientific NLI relation between them in model training improves their performance on scientific NLI. We make our dataset and code publicly available on GitHub.

Anthology ID:: 2025.findings-acl.1109
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 21524–21538
Language:
URL:: https://preview.aclanthology.org/landing_page/2025.findings-acl.1109/
DOI:
Bibkey:
Cite (ACL):: Firoz Shaik, Mobashir Sadat, Nikita Gautam, Doina Caragea, and Cornelia Caragea. 2025. A MISMATCHED Benchmark for Scientific Natural Language Inference. In Findings of the Association for Computational Linguistics: ACL 2025, pages 21524–21538, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: A MISMATCHED Benchmark for Scientific Natural Language Inference (Shaik et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2025.findings-acl.1109.pdf

PDF Cite Search Fix data