Yangxue Yangxue
2025
A Unified Agentic Framework for Evaluating Conditional Image Generation
Jifang Wang
|
Yangxue Yangxue
|
Longyue Wang
|
Zhenran Xu
|
Yiyu Wang
|
Yaowei Wang
|
Weihua Luo
|
Kaifu Zhang
|
Baotian Hu
|
Min Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Conditional image generation has gained significant attention for its ability to personalize content. However, the field faces challenges in developing task-agnostic, reliable, and explainable evaluation metrics. This paper introduces CIGEval, a unified agentic framework for comprehensive evaluation of conditional image generation tasks. CIGEval utilizes large multimodal models (LMMs) as its core, integrating a multi-functional toolbox and establishing a fine-grained evaluation framework. Additionally, we synthesize evaluation trajectories for fine-tuning, empowering smaller LMMs to autonomously select appropriate tools and conduct nuanced analyses based on tool outputs. Experiments across seven prominent conditional image generation tasks demonstrate that CIGEval (GPT-4o version) achieves a high correlation of 0.4625 with human assessments, closely matching the inter-annotator correlation of 0.47. Notably, when implemented with 7B open-source LMMs using only 2.3K training trajectories, CIGEval surpasses the previous GPT-4o-based state-of-the-art method. These findings indicate that CIGEval holds great potential for automating evaluation of image generation tasks while maintaining human-level reliability.
Search
Fix author
Co-authors
- Baotian Hu 1
- Weihua Luo 1
- Jifang Wang 1
- Longyue Wang 1
- Yiyu Wang 1
- show all...
Venues
- acl1