Coding Agents with Multimodal Browsing are Generalist Problem Solvers

Aditya Bharat Soni, Boxuan Li, Xingyao Wang, Valerie Chen, Graham Neubig


Abstract
Modern human labor is characterized by specialization; we train for years and develop particular tools that allow us to perform well across a variety of tasks. Similarly, specialized AI agents with task-specific tools or architectures often fail to generalize beyond their intended scope. In this work, we ask: *can agents achieve generalizability across diverse domains with a small, but well-chosen set of general tools?* We propose OpenHands-Versa, a single-agent system with a modest number of general tools like code execution, search engine, web browser and multimodal file viewer, for three practical domains: software engineering, deep research, and web browsing. Notably, OpenHands-Versa demonstrates superior or competitive performance over task-specific specialized agents on three challenging benchmarks: SWE-Bench Multimodal, GAIA, and The Agent Company, with absolute improvements in success rate of **9.1**, **1.3**, and **9.1** points, respectively. Thus, our *single-agent* system can achieve strong generalization indicating that specialist agents for these domains provide no practical benefit. Furthermore, we find that specialist multi-agent systems do not generalize beyond their intended scope. These findings establish OpenHands-Versa as a strong baseline for future research.
Anthology ID:
2026.findings-eacl.318
Volume:
Findings of the Association for Computational Linguistics: EACL 2026
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6052–6069
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.318/
DOI:
Bibkey:
Cite (ACL):
Aditya Bharat Soni, Boxuan Li, Xingyao Wang, Valerie Chen, and Graham Neubig. 2026. Coding Agents with Multimodal Browsing are Generalist Problem Solvers. In Findings of the Association for Computational Linguistics: EACL 2026, pages 6052–6069, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Coding Agents with Multimodal Browsing are Generalist Problem Solvers (Soni et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.318.pdf
Checklist:
 2026.findings-eacl.318.checklist.pdf