Cache-Efficient Posterior Sampling for Reinforcement Learning with LLM-Derived Priors Across Discrete and Continuous Domains

Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma


Abstract
Integrating large language models (LLMs) as action proposers in reinforcement learning (RL) significantly boosts performance in text-based environments but incurs prohibitive computational costs. We introduce a cache-efficient framework for Bayesian RL that leverages LLM-derived action suggestions, drastically reducing these costs while maintaining near-optimal performance. Our approach features an adaptive caching mechanism, optimized via meta-learning based on policy performance, to enable efficient inference across text-based games (e.g., TextWorld, ALFWorld) and robotic control tasks (e.g., MuJoCo, MetaWorld). This framework achieves a 3.8×4.7× reduction in LLM queries and 4.0×12.0× lower median latencies (85–93ms on consumer hardware), while retaining 96–98% of the uncached policy’s performance. We provide theoretical guarantees on the reliability of cached decisions with Kullback-Leibler (KL) divergence bounds, which are validated empirically by high success rates (90.4–95.6%) in complex text environments. For offline RL, our proposed CQL-Prior variant improves performance by 14–29% and reduces training time by 38–40%. Evaluations across eight diverse tasks demonstrate the framework’s generalizability and practicality for resource-constrained settings, making LLM-guided RL a viable and accessible approach for both text-based and robotic applications.
Anthology ID:
2025.emnlp-main.560
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11062–11090
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.emnlp-main.560/
DOI:
10.18653/v1/2025.emnlp-main.560
Bibkey:
Cite (ACL):
Ibne Farabi Shihab, Sanjeda Akter, and Anuj Sharma. 2025. Cache-Efficient Posterior Sampling for Reinforcement Learning with LLM-Derived Priors Across Discrete and Continuous Domains. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 11062–11090, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Cache-Efficient Posterior Sampling for Reinforcement Learning with LLM-Derived Priors Across Discrete and Continuous Domains (Shihab et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.emnlp-main.560.pdf
Checklist:
 2025.emnlp-main.560.checklist.pdf