ProcVQA: Benchmarking the Effects of Structural Properties in Mined Process Visualizations on Vision–Language Model Performance
Kazi Tasnim Zinat, Saad Mohammad Abrar, Shoumik Saha, Sharmila Duppala, Saimadhav Naga Sakhamuri, Zhicheng Liu
Abstract
Vision-Language Models have shown both impressive capabilities and notable failures in data visualization understanding tasks, but we have limited understanding on how specific properties within a visualization type affect model performance. We present ProcVQA, a benchmark designed to analyze how VLM performance can be affected by structure type and structural density of visualizations depicting frequent patterns mined from sequence data. ProcVQA consists of mined process visualizations spanning three structure types (linear sequences, tree, graph) with varying levels of structural density (quantified using the number of nodes and edges), with expert-validated QA pairs on these visualizations. We evaluate 21 proprietary and open-source models on the dataset on two major tasks: visual data extraction (VDE) and visual question answering (VQA) (with four categories of questions). Our analysis reveals three key findings. First, models exhibit steep performance drops on multi-hop reasoning, with question type and structure type impacting the degradation. Second, structural density strongly affects VDE performance: hallucinations and extraction errors increase with edge density, even in frontier models. Third, extraction accuracy does not necessarily translate into strong reasoning ability. By isolating structural factors through controlled visualization generation, ProcVQA enables precise identification of VLM limitations. ProcVQA is available at: https://github.com/kzintas/ProcVQA.- Anthology ID:
- 2025.findings-emnlp.1266
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2025
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 23316–23348
- Language:
- URL:
- https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1266/
- DOI:
- 10.18653/v1/2025.findings-emnlp.1266
- Cite (ACL):
- Kazi Tasnim Zinat, Saad Mohammad Abrar, Shoumik Saha, Sharmila Duppala, Saimadhav Naga Sakhamuri, and Zhicheng Liu. 2025. ProcVQA: Benchmarking the Effects of Structural Properties in Mined Process Visualizations on Vision–Language Model Performance. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 23316–23348, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- ProcVQA: Benchmarking the Effects of Structural Properties in Mined Process Visualizations on Vision–Language Model Performance (Zinat et al., Findings 2025)
- PDF:
- https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1266.pdf