Firoz Shaik

2025

pdf bib abs
A MISMATCHED Benchmark for Scientific Natural Language Inference
Firoz Shaik | Mobashir Sadat | Nikita Gautam | Doina Caragea | Cornelia Caragea
Findings of the Association for Computational Linguistics: ACL 2025

Scientific Natural Language Inference (NLI) is the task of predicting the semantic relation between a pair of sentences extracted from research articles. Existing datasets for this task are derived from various computer science (CS) domains, whereas non-CS domains are completely ignored. In this paper, we introduce a novel evaluation benchmark for scientific NLI, called MisMatched. The new MisMatched benchmark covers three non-CS domains–Psychology, Engineering, and Public Health, and contains 2,700 human annotated sentence pairs. We establish strong baselines on MisMatched using both Pre-trained Small Language Models (SLMs) and Large Language Models (LLMs). Our best performing baseline shows a Macro F1 of only 78.17% illustrating the substantial headroom for future improvements. In addition to introducing the MisMatched benchmark, we show that incorporating sentence pairs having an implicit scientific NLI relation between them in model training improves their performance on scientific NLI. We make our dataset and code publicly available on GitHub.

Co-authors

Venues

findings1

Fix data

Firoz Shaik

Fixing paper assignments

2025

Co-authors

Venues