F2TEval: Human-Aligned Multi-Dimensional Evaluation for Figure-to-Text Task

Tan Yue, Rui Mao, Zilong Song, Zonghai Hu, Dongyan Zhao


Abstract
Figure-to-Text (F2T) tasks aim to convert structured figure information into natural language text, serving as a bridge between visual perception and language understanding.However, existing evaluation methods remain limited: 1) Reference-based methods can only capture shallow semantic similarities and rely on costly labeled reference text; 2) Reference-free methods depend on multimodal large language models, which suffer from low efficiency and instruction sensitivity; 3) Existing methods provide only sample-level evaluations, lacking interpretability and alignment with expert-level multi-dimensional evaluation criteria.Accordingly, we propose F2TEval, a five-dimensional reference-free evaluation method aligned with expert criteria, covering faithfulness, completeness, conciseness, logicality, and analysis, to support fine-grained evaluation. We design a lightweight mixture-of-experts model that incorporates independent scoring heads and applies the Hilbert-Schmidt Independence Criterion to optimize the disentanglement of scoring representations across dimensions. Furthermore, we construct F2TBenchmark, a human-annotated benchmark dataset covering 21 chart types and 35 application domains, to support research on F2T evaluation. Experimental results demonstrate our model’s superior performance and efficiency, outperforming Gemini-2.0 and Claude-3.5 with only 0.9B parameters.
Anthology ID:
2025.emnlp-main.195
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3932–3948
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.195/
DOI:
Bibkey:
Cite (ACL):
Tan Yue, Rui Mao, Zilong Song, Zonghai Hu, and Dongyan Zhao. 2025. F2TEval: Human-Aligned Multi-Dimensional Evaluation for Figure-to-Text Task. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 3932–3948, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
F2TEval: Human-Aligned Multi-Dimensional Evaluation for Figure-to-Text Task (Yue et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.195.pdf
Checklist:
 2025.emnlp-main.195.checklist.pdf