Can NLP Models Detect When One Publication Outweighs Twenty? Predicting Systematic Review Conclusion Changes

Ebrahim Alharbi, Mark Stevenson


Abstract
Systematic reviews underpin evidence-based medicine but can outdate quickly when new evidence appears. We formulate a novel prediction task: given a review and new studies that have appeared since its publication, predict whether the review’s conclusions will change. A dataset of 3,326 Cochrane review-update pairs is constructed and a range of approaches explored including feature-based baselines, zero and few-shot LLMs, in addition to parameter efficient fine-tuning. Fine-tuning Qwen2.5 14B achieves the highest AUC-ROC (70.4%).
Anthology ID:
2026.bionlp-1.68
Volume:
BioNLP 2026
Month:
July
Year:
2026
Address:
San Diego, California
Editors:
Dina Demner-Fushman, Sophia Ananiadou, Kirk Roberts, Junichi Tsujii
Venues:
BioNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
843–852
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.bionlp-1.68/
DOI:
Bibkey:
Cite (ACL):
Ebrahim Alharbi and Mark Stevenson. 2026. Can NLP Models Detect When One Publication Outweighs Twenty? Predicting Systematic Review Conclusion Changes. In BioNLP 2026, pages 843–852, San Diego, California. Association for Computational Linguistics.
Cite (Informal):
Can NLP Models Detect When One Publication Outweighs Twenty? Predicting Systematic Review Conclusion Changes (Alharbi & Stevenson, BioNLP 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.bionlp-1.68.pdf