Boxuan Li
2026
Coding Agents with Multimodal Browsing are Generalist Problem Solvers
Aditya Bharat Soni | Boxuan Li | Xingyao Wang | Valerie Chen | Graham Neubig
Findings of the Association for Computational Linguistics: EACL 2026
Aditya Bharat Soni | Boxuan Li | Xingyao Wang | Valerie Chen | Graham Neubig
Findings of the Association for Computational Linguistics: EACL 2026
Modern human labor is characterized by specialization; we train for years and develop particular tools that allow us to perform well across a variety of tasks. Similarly, specialized AI agents with task-specific tools or architectures often fail to generalize beyond their intended scope. In this work, we ask: *can agents achieve generalizability across diverse domains with a small, but well-chosen set of general tools?* We propose OpenHands-Versa, a single-agent system with a modest number of general tools like code execution, search engine, web browser and multimodal file viewer, for three practical domains: software engineering, deep research, and web browsing. Notably, OpenHands-Versa demonstrates superior or competitive performance over task-specific specialized agents on three challenging benchmarks: SWE-Bench Multimodal, GAIA, and The Agent Company, with absolute improvements in success rate of **9.1**, **1.3**, and **9.1** points, respectively. Thus, our *single-agent* system can achieve strong generalization indicating that specialist agents for these domains provide no practical benefit. Furthermore, we find that specialist multi-agent systems do not generalize beyond their intended scope. These findings establish OpenHands-Versa as a strong baseline for future research.