Zhenyu Wu
Other people with similar names: Zhenyu Wu (XJTU)
Unverified author pages with similar names: Zhenyu Wu
2026
OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agents
Bowen Yang | Kaiming Jin | Zhenyu Wu | Zhaoyang Liu | Qiushi Sun | Zehao Li | JingJing Xie | Zhoumianze Liu | Fangzhi Xu | Kanzhi Cheng | Yian Wang | Qingyun Li | Yu Qiao | Zun Wang | Zichen Ding
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Bowen Yang | Kaiming Jin | Zhenyu Wu | Zhaoyang Liu | Qiushi Sun | Zehao Li | JingJing Xie | Zhoumianze Liu | Fangzhi Xu | Kanzhi Cheng | Yian Wang | Qingyun Li | Yu Qiao | Zun Wang | Zichen Ding
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
While Vision-Language Models (VLMs) have significantly advanced Computer-Using Agents (CUAs), current agentic frameworks struggle with robustness in novel domains and long-horizon workflows due to the absence of visual-aware tutorial retrieval and the lack of granular control over historical visual context curation and pruning. To bridge these gaps, we introduce OS-Symphony, a holistic framework that comprises an Orchestrator coordinating two key innovations for robust automation: (1) a Reflection-Memory Agent that utilizes milestone-driven long-term memory to enable trajectory-level self-correction, effectively mitigating visual context loss in long-horizon tasks; (2) Versatile Tool Agents featuring a Multimodal Searcher that adopts a “SeeAct” paradigm to navigate a browser-based sandbox to synthesize live, visually aligned tutorials, thereby resolving fidelity issues in unseen scenarios. Experimental results demonstrate that OS-Symphony delivers substantial performance gains across varying model scales, establishing new state-of-the-art results on three online benchmarks, notably achieving 65.84% on OSWorld. All research assets will be made publicly available.