TVWorld: Foundations for Remote-Control TV Agents
Zhantao Ma, Quanfeng Lu, Shuai Zhong, Dahai Yu, Ping Luo, Michael Ng
Abstract
Recent large vision–language models (LVLMs) have demonstrated strong potential for device control. However, existing research has primarily focused on point-and-click (PnC) interaction, while remote-control (RC) interaction commonly encountered in everyday TV usage remains largely underexplored. To fill this gap, we introduce TVWorld, an offline graph-based abstraction of real-world TV navigation that enables reproducible and deployment-free evaluation. On this basis, we derive two complementary benchmarks that comprehensively assess TV-use capabilities: TVWorld-N for topology-aware navigation and TVWorld-G for focus-aware grounding. These benchmarks expose a key limitation of existing agents: insufficient topology awareness for focus-based, long-horizon TV navigation. Motivated by this finding, we propose a Topology-Aware Training framework that injects topology awareness into LVLMs. Using this framework, we develop TVTheseus, a foundation model specialized for TV navigation. TVTheseus achieves a success rate of 68.3 on TVWorld-N, surpassing strong closed-source baselines such as Gemini 3 Flash and establishing state-of-the-art (SOTA) performance. Additional analyses further provide valuable insights into the development of effective TV-use agents.- Anthology ID:
- 2026.findings-acl.1792
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 35959–35984
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1792/
- DOI:
- Cite (ACL):
- Zhantao Ma, Quanfeng Lu, Shuai Zhong, Dahai Yu, Ping Luo, and Michael Ng. 2026. TVWorld: Foundations for Remote-Control TV Agents. In Findings of the Association for Computational Linguistics: ACL 2026, pages 35959–35984, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- TVWorld: Foundations for Remote-Control TV Agents (Ma et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1792.pdf