Jiarui Sun

Other people with similar names: Jiarui Sun


2026

The success of large language models (LLMs) across domains highlights their potential in scientific tasks, with molecular optimization being a promising frontier. Traditionally, this optimization relies on iterative expert feedback to refine molecules toward desired properties, a process well aligned with LLMs’ strengths. **As an experience-driven task, molecular optimization depends critically on the domain feedback and accumulation of historical knowledge. However, none of the existing methods fully leverages such feedback and historical knowledge with reasoning traces and chemical insights.** In this work, we propose F2R: Feedback to Reasoning, a conversational molecular optimization pipeline that enables LLMs to accumulate and retrieve past actions, rationales, and feedback. Like humans, LLMs can generate imperfect reasoning; F2R is the first framework to use detailed domain feedback to critique and improve this reasoning. This transforms LLMs from passive text generators into agentic experts that learn both actions and reasoning from experience. Consequently, F2R shows remarkable performance.
While memory is a core component in agent systems, its behavioral impact in complex, long-horizon domains like machine learning engineering (MLE) remains poorly understood. Unlike short, reactive exchanges, MLE agents solve tasks through cycles of experimentation and improvement where past errors can inform future success. This paper presents a systematic study dissecting how memory influences agent behavior and performance across diverse MLE challenges. We first introduce a dynamic coding memory designed to capture and reuse debugging experiences, and integrate it into two representative agent paradigms: a sequential, chain-based agent that mirrors human-like iterative refinement, and a parallel, tree-based agent that performs broad, self-exploratory search in the code space. Our central finding is that the role of memory is contingent on the agent’s underlying architecture. For chain-based agents, memory proves highly beneficial, enabling them to avoid recurring mistakes and engage in more coherent, iterative refinement, which significantly improves reliability and task success. In contrast, for tree-based search agents, memory introduces a critical trade-off: it enhances procedural stability at the cost of constraining search diversity, which can prematurely narrow exploration and lead to suboptimal final solutions. These findings reveal a fundamental trade-off between procedural reliability and solution innovation modulated by memory, offering insights for designing more effective and robust MLE agents.