Marissa Radensky
2025
CodeScientist: End-to-End Semi-Automated Scientific Discovery with Code-based Experimentation
Peter Jansen
|
Oyvind Tafjord
|
Marissa Radensky
|
Pao Siangliulue
|
Tom Hope
|
Bhavana Dalvi Mishra
|
Bodhisattwa Prasad Majumder
|
Daniel S Weld
|
Peter Clark
Findings of the Association for Computational Linguistics: ACL 2025
Despite the surge of interest in autonomous scientific discovery (ASD) of software artifacts (e.g., improved ML algorithms), current ASD systems face two key limitations: (1) they largely explore variants of existing codebases or similarly constrained design spaces, and (2) they produce large volumes of research artifacts (such as automatically generated papers and code) that are typically evaluated using conference-style paper review with limited evaluation of code. In this work we introduce CodeScientist, a novel ASD system that frames ideation and experiment construction as a form of genetic search jointly over combinations of research articles and codeblocks defining common actions in a domain (like prompting a language model). We use this paradigm to conduct hundreds of automated experiments on machine-generated ideas broadly in the domain of agents and virtual environments, with the system returning 19 discoveries, 6 of which were judged as being both at least minimally sound and incrementally novel after a multi-faceted evaluation beyond that typically conducted in prior work, including external (conference-style) review, code review, and replication attempts. Moreover, the discoveries span new tasks, agents, metrics, and data, suggesting a qualitative shift from benchmark optimization to broader discoveries.
Literature-Grounded Novelty Assessment of Scientific Ideas
Simra Shahid
|
Marissa Radensky
|
Raymond Fok
|
Pao Siangliulue
|
Daniel S Weld
|
Tom Hope
Proceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025)
Automated scientific idea generation systems have made remarkable progress, yet the automatic evaluation of idea novelty remains a critical and underexplored challenge. Manual evaluation of novelty through literature review is labor-intensive, prone to error due to subjectivity, and impractical at scale. To address these issues, we propose the **Idea Novelty Checker**, an LLM-based retrieval-augmented generation (RAG) framework that leverages a two-stage retrieve-then-rerank approach. The Idea Novelty Checker first collects a broad set of relevant papers using keyword and snippet-based retrieval, then refines this collection through embedding-based filtering followed by facet-based LLM re-ranking. It incorporates expert-labeled examples to guide the system in comparing papers for novelty evaluation and in generating literature-grounded reasoning. Our extensive experiments demonstrate that our novelty checker achieves approximately 13% higher agreement than existing approaches. Ablation studies further showcases the importance of the facet-based re-ranker in identifying the most relevant literature for novelty evaluation.
Search
Fix author
Co-authors
- Tom Hope 2
- Pao Siangliulue 2
- Daniel S. Weld 2
- Peter Clark 1
- Bhavana Dalvi 1
- show all...