Android Coach: Improve Online Agentic Training Efficiency with Single State Multiple Actions

Guo Gan; Yuxuan Ding; Cong Chen (陈聪); Yuwei Ren; Yin Huang; Hong Zhou

Android Coach: Improve Online Agentic Training Efficiency with Single State Multiple Actions

Guo Gan, Yuxuan Ding, Cong Chen, Yuwei Ren, Yin Huang, Hong Zhou

Abstract

Online reinforcement learning (RL) serves as an effective method for enhancing the capabilities of Android agents. However, guiding agents to learn through online interaction is prohibitively expensive due to the high latency of emulators and the sample inefficiency of existing RL algorithms. We identify a fundamental limitation in current approaches: the Single State Single Action paradigm, which updates the policy with one-to-one state-action pairs from online one-way rollouts without fully exploring each costly emulator state. In this paper, we propose Android Coach, a novel framework that shifts the training paradigm to Single State Multiple Actions, allowing the agent to sample and utilize multiple actions for a single online state. We enable this without additional emulator overhead by learning a critic that estimates action values. To ensure the critic serves as a reliable coach, we integrate a process reward model and introduce a group-wise advantage estimator based on the averaged critic outputs. Extensive experiments demonstrate the effectiveness and efficiency of Android Coach: it achieves 7.5% and 8.3% success rate improvements on AndroidLab and AndroidWorld over UI-TARS-1.5-7B, and attains 1.4x higher training efficiency than Single State Single Action methods PPO and GRPO at matched success rates.

Anthology ID:: 2026.acl-long.613
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 13407–13428
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.613/
DOI:
Bibkey:
Cite (ACL):: Guo Gan, Yuxuan Ding, Cong Chen, Yuwei Ren, Yin Huang, and Hong Zhou. 2026. Android Coach: Improve Online Agentic Training Efficiency with Single State Multiple Actions. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13407–13428, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Android Coach: Improve Online Agentic Training Efficiency with Single State Multiple Actions (Gan et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.613.pdf
Checklist:: 2026.acl-long.613.checklist.pdf

PDF Cite Search Checklist Fix data