Micah Mok


2026

One of biggest missing capabilities in state-of-the-art AI systems is the ability to learn continually after deployment. However, implementing an inference-time learning system has several challenges including the large memory requirement of gradient-based algorithms that are used to train state-of-the-art LLMs. Evolutionary Strategies (ES) have recently re-emerged as a gradient-free alternative to traditional learning algorithms and have shown encouraging performance on specific tasks in LLMs. In this paper, we perform a more comprehensive analysis of ES and specifically evaluate its forgetting curves when training for a larger number of update steps. We find that although ES is able to reach performance numbers closer to GRPO for math and reasoning tasks, it is accompanied by significant forgetting of prior abilities. We also show that the updates made using ES are much less sparse and have a larger l2 norm compared to corresponding GRPO updates, explaining the contrasting forgetting curves between the two algorithms. With this study, we aim to specifically highlight the issue of forgetting in gradient-free algorithms like ES and hope to inspire future work to mitigate these issues.