Junpeng Wang

2026

While memory is a core component in agent systems, its behavioral impact in complex, long-horizon domains like machine learning engineering (MLE) remains poorly understood. Unlike short, reactive exchanges, MLE agents solve tasks through cycles of experimentation and improvement where past errors can inform future success. This paper presents a systematic study dissecting how memory influences agent behavior and performance across diverse MLE challenges. We first introduce a dynamic coding memory designed to capture and reuse debugging experiences, and integrate it into two representative agent paradigms: a sequential, chain-based agent that mirrors human-like iterative refinement, and a parallel, tree-based agent that performs broad, self-exploratory search in the code space. Our central finding is that the role of memory is contingent on the agent’s underlying architecture. For chain-based agents, memory proves highly beneficial, enabling them to avoid recurring mistakes and engage in more coherent, iterative refinement, which significantly improves reliability and task success. In contrast, for tree-based search agents, memory introduces a critical trade-off: it enhances procedural stability at the cost of constraining search diversity, which can prematurely narrow exploration and lead to suboptimal final solutions. These findings reveal a fundamental trade-off between procedural reliability and solution innovation modulated by memory, offering insights for designing more effective and robust MLE agents.

2022

pdf bib abs

Motivated by the widespread interest in the cross-lingual transfer of NLP models from high resource to low resource languages, research on Cross-lingual word embeddings (CLWEs) has gained much popularity over the years. Among the most successful and attractive CLWE models are the unsupervised CLWE models. These unsupervised CLWE models pose the alignment task as a Wasserstein-Procrustes problem aiming to estimate a permutation matrix and an orthogonal matrix jointly. Most existing unsupervised CLWE models resort to Optimal Transport (OT) based methods to estimate the permutation matrix. However, linear programming algorithms and approximate OT solvers via Sinkhorn for computing the permutation matrix scale cubically and quadratically, respectively, in the input size. This makes it impractical and infeasible to compute OT distances exactly for larger sample size, resulting in a poor approximation quality of the permutation matrix and subsequently a less robust learned transfer function or mapper. This paper proposes an unsupervised projection-based CLWE model called quantized Wasserstein Procrustes (qWP) that jointly estimates a permutation matrix and an orthogonal matrix. qWP relies on a quantization step to estimate the permutation matrix between two probability distributions or measures. This approach substantially improves the approximation quality of empirical OT solvers given fixed computational cost. We demonstrate that qWP achieves state-of-the-art results on the Bilingual lexicon Induction (BLI) task.

Co-authors

Chin-Chia Michael Yeh 1

Wei Zhang 1

Xinyu Zhao 1

Zhongfang Zhuang 1

Venues

AMTA1
Findings1

Fix author