Canmiao Zhou


2025

pdf bib
Syntax-Aware Retrieval Augmentation for Neural Symbolic Regression
Canmiao Zhou | Han Huang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Symbolic regression is a powerful technique for discovering mathematical expressions that best fit observed data. While neural symbolic regression methods based on large-scale pre-trained models perform well on simple tasks, the reliance on fixed parametric knowledge typically limits their generalization to complex and diverse data distributions. To address this challenge, we propose a syntax-aware retrieval-augmented mechanism that leverages the syntactic structure of symbolic expressions to perform context-aware retrieval from a pre-constructed token datastore during inference. This mechanism enables the model to incorporate highly relevant non-parametric prior information to assist in expression generation. Additionally, we design an entropy-based confidence network that dynamically adjusts the fusion strength between neural and retrieved components by estimating predictive uncertainty. Extensive experiments on multiple symbolic regression benchmarks demonstrate that the proposed method significantly outperforms representative baselines, validating the effectiveness of retrieval augmentation in enhancing the generalization performance of neural symbolic regression models.