F2TEval: Human-Aligned Multi-Dimensional Evaluation for Figure-to-Text Task

Tan Yue; Rui Mao; Zilong Song; Zonghai Hu; Dongyan Zhao

F2TEval: Human-Aligned Multi-Dimensional Evaluation for Figure-to-Text Task

Tan Yue, Rui Mao, Zilong Song, Zonghai Hu, Dongyan Zhao

Abstract

Figure-to-Text (F2T) tasks aim to convert structured figure information into natural language text, serving as a bridge between visual perception and language understanding.However, existing evaluation methods remain limited: 1) Reference-based methods can only capture shallow semantic similarities and rely on costly labeled reference text; 2) Reference-free methods depend on multimodal large language models, which suffer from low efficiency and instruction sensitivity; 3) Existing methods provide only sample-level evaluations, lacking interpretability and alignment with expert-level multi-dimensional evaluation criteria.Accordingly, we propose F2TEval, a five-dimensional reference-free evaluation method aligned with expert criteria, covering faithfulness, completeness, conciseness, logicality, and analysis, to support fine-grained evaluation. We design a lightweight mixture-of-experts model that incorporates independent scoring heads and applies the Hilbert-Schmidt Independence Criterion to optimize the disentanglement of scoring representations across dimensions. Furthermore, we construct F2TBenchmark, a human-annotated benchmark dataset covering 21 chart types and 35 application domains, to support research on F2T evaluation. Experimental results demonstrate our model’s superior performance and efficiency, outperforming Gemini-2.0 and Claude-3.5 with only 0.9B parameters.

Anthology ID:: 2025.emnlp-main.195
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3932–3948
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.195/
DOI:
Bibkey:
Cite (ACL):: Tan Yue, Rui Mao, Zilong Song, Zonghai Hu, and Dongyan Zhao. 2025. F2TEval: Human-Aligned Multi-Dimensional Evaluation for Figure-to-Text Task. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 3932–3948, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: F2TEval: Human-Aligned Multi-Dimensional Evaluation for Figure-to-Text Task (Yue et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.195.pdf
Checklist:: 2025.emnlp-main.195.checklist.pdf

PDF Cite Search Checklist Fix data