Automatic and Reliable Evaluation for Academic Caption-to-Figure Generation with LMMs
Guanghui Ye, Huan Zhao, Qin Zhu, Fengnan Li, Jiaqi Li, Yixian Shen, Zhonghao Ren, Zhihua Jiang
Abstract
Existing datasets for evaluating text-to-image generation focus mostly on real-life images, which poses challenges for assessing academicfigure generation given real scientific captions, which is a hot topic in AI for Science. To fill the gap, we propose HE4AFG, a novel datasetwhich first provides a Holistic Evaluation for Academic caption-to-Figure Generation (AFG). Specifically, HE4AFG collects real figure captions from 8 scientific domains and finally generates 3,900 evaluation samples (particularly, including multi-panel figures) using 5 mainstream large multimodal models (LMMs). For each sample, we provide high-quality human ratings in terms of three aspects—scientific aesthetic (SA), topic relevance (TR), and attribute correctness (AC). Moreover, we present two trainable models: (1) HE4AFG-E, an automated Evaluation model for AFG, which generates aspect-aware training examples and then use them to train three aspect-specific evaluation modules via contrastive learning; (2) HE4AFG-R, an automated Refinement model, which generates and utilizes feedback on the quality of the figures (e.g., unfaithful elements) to continuously improve AFG. Extensive experiments on HE4AFG demonstrate the effectiveness and performance advantages of our models.- Anthology ID:
- 2026.acl-long.2055
- Volume:
- Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 44406–44423
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.2055/
- DOI:
- Cite (ACL):
- Guanghui Ye, Huan Zhao, Qin Zhu, Fengnan Li, Jiaqi Li, Yixian Shen, Zhonghao Ren, and Zhihua Jiang. 2026. Automatic and Reliable Evaluation for Academic Caption-to-Figure Generation with LMMs. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 44406–44423, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Automatic and Reliable Evaluation for Academic Caption-to-Figure Generation with LMMs (Ye et al., ACL 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.2055.pdf