Teaching Language Models to Forecast Research Success Through Comparative Idea Evaluation

Srujan P Mule; Aniketh Garikaparthi; Manasi Patwardhan

Teaching Language Models to Forecast Research Success Through Comparative Idea Evaluation

Srujan P Mule, Aniketh Garikaparthi, Manasi Patwardhan

Abstract

As language models accelerate scientific research by automating hypothesis generation and implementation, a new bottleneck emerges: evaluating and filtering hundreds of AI-generated ideas without exhaustive experimentation. We ask whether LMs can learn to forecast the empirical success of research ideas before any experiments are run. We study comparative empirical forecasting: given a benchmark-specific research goal and two candidate ideas, predict which will achieve better benchmark performance. We construct a dataset of 11,488 idea pairs grounded in objective outcomes from PapersWithCode. While off-the-shelf 8B-parameter models struggle (30% acc.), SFT dramatically boosts performance to 77.1%, outperforming GPT-5 (61.1%). By framing evaluation as a reasoning task via Reinforcement Learning with Verifiable Rewards (RLVR), we train models to discover latent reasoning paths, achieving 71.35% acc. with interpretable justifications. Through additional ablations and out-of-distribution tests, we show robustness to surface-level heuristics and transfer to both a cross-domain time-split test set and an independently constructed test set. Our results demonstrate that compute-efficient small language models can serve as effective, objective verifiers, offering a scalable path for autonomous scientific discovery.

Anthology ID:: 2026.findings-acl.1918
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 38491–38529
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.findings-acl.1918/
DOI:
Bibkey:
Cite (ACL):: Srujan P Mule, Aniketh Garikaparthi, and Manasi Patwardhan. 2026. Teaching Language Models to Forecast Research Success Through Comparative Idea Evaluation. In Findings of the Association for Computational Linguistics: ACL 2026, pages 38491–38529, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Teaching Language Models to Forecast Research Success Through Comparative Idea Evaluation (Mule et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.findings-acl.1918.pdf
Checklist:: 2026.findings-acl.1918.checklist.pdf

PDF Cite Search Checklist Fix data