Teaching Language Models to Forecast Research Success Through Comparative Idea Evaluation

Srujan P Mule, Aniketh Garikaparthi, Manasi Patwardhan


Abstract
As language models accelerate scientific research by automating hypothesis generation and implementation, a new bottleneck emerges: evaluating and filtering hundreds of AI-generated ideas without exhaustive experimentation. We ask whether LMs can learn to forecast the empirical success of research ideas before any experiments are run. We study comparative empirical forecasting: given a benchmark-specific research goal and two candidate ideas, predict which will achieve better benchmark performance. We construct a dataset of 11,488 idea pairs grounded in objective outcomes from PapersWithCode. While off-the-shelf 8B-parameter models struggle (30% acc.), SFT dramatically boosts performance to 77.1%, outperforming GPT-5 (61.1%). By framing evaluation as a reasoning task via Reinforcement Learning with Verifiable Rewards (RLVR), we train models to discover latent reasoning paths, achieving 71.35% acc. with interpretable justifications. Through additional ablations and out-of-distribution tests, we show robustness to surface-level heuristics and transfer to both a cross-domain time-split test set and an independently constructed test set. Our results demonstrate that compute-efficient small language models can serve as effective, objective verifiers, offering a scalable path for autonomous scientific discovery.
Anthology ID:
2026.findings-acl.1918
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
38491–38529
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.findings-acl.1918/
DOI:
Bibkey:
Cite (ACL):
Srujan P Mule, Aniketh Garikaparthi, and Manasi Patwardhan. 2026. Teaching Language Models to Forecast Research Success Through Comparative Idea Evaluation. In Findings of the Association for Computational Linguistics: ACL 2026, pages 38491–38529, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Teaching Language Models to Forecast Research Success Through Comparative Idea Evaluation (Mule et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.findings-acl.1918.pdf
Checklist:
 2026.findings-acl.1918.checklist.pdf