@inproceedings{yue-etal-2025-f2teval,
    title = "{F}2{TE}val: Human-Aligned Multi-Dimensional Evaluation for Figure-to-Text Task",
    author = "Yue, Tan  and
      Mao, Rui  and
      Song, Zilong  and
      Hu, Zonghai  and
      Zhao, Dongyan",
    editor = "Christodoulopoulos, Christos  and
      Chakraborty, Tanmoy  and
      Rose, Carolyn  and
      Peng, Violet",
    booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.195/",
    pages = "3932--3948",
    ISBN = "979-8-89176-332-6",
    abstract = "Figure-to-Text (F2T) tasks aim to convert structured figure information into natural language text, serving as a bridge between visual perception and language understanding.However, existing evaluation methods remain limited: 1) Reference-based methods can only capture shallow semantic similarities and rely on costly labeled reference text; 2) Reference-free methods depend on multimodal large language models, which suffer from low efficiency and instruction sensitivity; 3) Existing methods provide only sample-level evaluations, lacking interpretability and alignment with expert-level multi-dimensional evaluation criteria.Accordingly, we propose F2TEval, a five-dimensional reference-free evaluation method aligned with expert criteria, covering faithfulness, completeness, conciseness, logicality, and analysis, to support fine-grained evaluation. We design a lightweight mixture-of-experts model that incorporates independent scoring heads and applies the Hilbert-Schmidt Independence Criterion to optimize the disentanglement of scoring representations across dimensions. Furthermore, we construct F2TBenchmark, a human-annotated benchmark dataset covering 21 chart types and 35 application domains, to support research on F2T evaluation. Experimental results demonstrate our model{'}s superior performance and efficiency, outperforming Gemini-2.0 and Claude-3.5 with only 0.9B parameters."
}Markdown (Informal)
[F2TEval: Human-Aligned Multi-Dimensional Evaluation for Figure-to-Text Task](https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.195/) (Yue et al., EMNLP 2025)
ACL