Large Language Model-Enhanced Multi-Armed Bandits

Jiahang Sun; Zhiyong Wang; Runhan Yang; Chenjun Xiao; John C.s. Lui; Zhongxiang Dai

Large Language Model-Enhanced Multi-Armed Bandits

Jiahang Sun, Zhiyong Wang, Runhan Yang, Chenjun Xiao, John C.s. Lui, Zhongxiang Dai

Abstract

Large language models (LLMs) have been applied to sequential decision-making tasks like multi-armed bandits (MAB), where an LLM is tasked with selecting arms in each iteration. However, this direct arm selection approach is often suboptimal. We propose an alternative method combining classical MAB algorithms with LLMs. Specifically, we use a classical MAB framework and leverage the in-context learning capability of LLMs for reward prediction. First, we integrate the LLM-based predictor into Thompson sampling (TS) with a decaying temperature schedule to balance exploration and exploitation. We also incorporate the predictor into a regression oracle-based MAB algorithm with explicit exploration. Additionally, we extend our TS-based algorithm to dueling bandits, where only preference feedback between arm pairs is available, requiring significant algorithmic modifications. Our empirical evaluations on synthetic MAB tasks show that our algorithms outperform LLM-based direct arm selection. In experiments on real-world text datasets, we demonstrate that, in tasks where arms lack exploitable semantic meaning, our approach delivers significantly better performance than direct arm selection.

Anthology ID:: 2026.acl-long.368
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 8130–8145
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.368/
DOI:
Bibkey:
Cite (ACL):: Jiahang Sun, Zhiyong Wang, Runhan Yang, Chenjun Xiao, John C.s. Lui, and Zhongxiang Dai. 2026. Large Language Model-Enhanced Multi-Armed Bandits. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8130–8145, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Large Language Model-Enhanced Multi-Armed Bandits (Sun et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.368.pdf
Checklist:: 2026.acl-long.368.checklist.pdf

PDF Cite Search Checklist Fix data