Deploying Tiny LVLM Judges for Real-World Evaluation of Chart Models: Lessons Learned and Best Practices

Md Tahmid Rahman Laskar; Mohammed Saidul Islam; Ridwan Mahbub; Mizanur Rahman; Amran Bhuiyan; Israt Jahan; Mir Tafseer Nayeem; Shafiq Joty; Enamul Hoque; Jimmy Huang

Deploying Tiny LVLM Judges for Real-World Evaluation of Chart Models: Lessons Learned and Best Practices

Md Tahmid Rahman Laskar, Mohammed Saidul Islam, Ridwan Mahbub, Mizanur Rahman, Amran Bhuiyan, Israt Jahan, Mir Tafseer Nayeem, Shafiq Joty, Enamul Hoque, Jimmy Huang

Abstract

Large Vision-Language Models (LVLMs) with only 7B parameters have shown promise as automated judges in chart comprehension tasks. However, tiny models (<=2B parameters) still perform poorly as judges, limiting their real-world use in resource-constrained settings. To address this, we propose two approaches to ensure cost‐efficient evaluation: (i) multi-criteria prompting, which combines separate evaluation criteria into a single query, and (ii) domain‐adaptive transfer learning, in which we fine‐tune a 2B‐parameter VLM on synthetic judgments in a chart dataset to create the ChartJudge. Experiments show that multi-criteria prompting exposes robustness gaps, which led to a huge drop in performance for 7B models, including specialized LVLM judges like LLaVA‐Critic. In addition, we find that our tiny LVLM (ChartJudge) can effectively transfer knowledge from one dataset to another to make it a more specialized model. Our fine-grained analysis across chart types and query complexities offers actionable insights into trade-offs between model size, prompt design, and transferability, enabling scalable, low-cost evaluation for chart reasoning tasks. Our code and the data will be made publicly available.

Anthology ID:: 2025.emnlp-industry.134
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:: November
Year:: 2025
Address:: Suzhou (China)
Editors:: Saloni Potdar, Lina Rojas-Barahona, Sebastien Montella
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1906–1918
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.134/
DOI:
Bibkey:
Cite (ACL):: Md Tahmid Rahman Laskar, Mohammed Saidul Islam, Ridwan Mahbub, Mizanur Rahman, Amran Bhuiyan, Israt Jahan, Mir Tafseer Nayeem, Shafiq Joty, Enamul Hoque, and Jimmy Huang. 2025. Deploying Tiny LVLM Judges for Real-World Evaluation of Chart Models: Lessons Learned and Best Practices. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 1906–1918, Suzhou (China). Association for Computational Linguistics.
Cite (Informal):: Deploying Tiny LVLM Judges for Real-World Evaluation of Chart Models: Lessons Learned and Best Practices (Laskar et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.134.pdf

PDF Cite Search Fix data