ChartAgent: A Multimodal Agent for Visually Grounded Reasoning in Complex Chart Question Answering

Rachneet Kaur; Nishan Srishankar; Zhen Zeng; Sumitra Ganesh

ChartAgent: A Multimodal Agent for Visually Grounded Reasoning in Complex Chart Question Answering

Rachneet Kaur, Nishan Srishankar, Zhen Zeng, Sumitra Ganesh

Abstract

Recent multimodal LLMs have shown promise in chart-based visual question answering, but their performance declines sharply on unannotated charts—those requiring precise visual interpretation rather than relying on textual shortcuts. To address this, we introduce ChartAgent, a novel agentic framework that explicitly performs visual reasoning directly within the chart’s spatial domain. Unlike textual chain-of-thought reasoning, ChartAgent iteratively decomposes queries into visual subtasks and actively manipulates and interacts with chart images through specialized actions such as drawing annotations, cropping regions (e.g., segmenting pie slices, isolating bars), and localizing axes, using a library of chart-specific vision tools to fulfill each subtask. This iterative reasoning process closely mirrors human cognitive strategies for chart comprehension. ChartAgent achieves state-of-the-art accuracy on the ChartBench and ChartX benchmarks, surpassing prior methods by up to 16.07% absolute gain overall and 17.31% on unannotated, numerically intensive queries. Furthermore, our analyses show that ChartAgent is (a) effective across diverse chart types, (b) achieves the highest scores across varying visual and reasoning complexity levels, and (c) serves as a plug-and-play framework that boosts performance across diverse underlying LLMs. Our work is among the first to demonstrate visually grounded reasoning for chart understanding using tool-augmented multimodal agents.

Anthology ID:: 2026.acl-long.843
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 18488–18555
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.843/
DOI:
Bibkey:
Cite (ACL):: Rachneet Kaur, Nishan Srishankar, Zhen Zeng, and Sumitra Ganesh. 2026. ChartAgent: A Multimodal Agent for Visually Grounded Reasoning in Complex Chart Question Answering. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 18488–18555, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: ChartAgent: A Multimodal Agent for Visually Grounded Reasoning in Complex Chart Question Answering (Kaur et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.843.pdf
Checklist:: 2026.acl-long.843.checklist.pdf

PDF Cite Search Checklist Fix data