Kai Chen
Other people with similar names: Kai Chen, Kai Chen, Kai Chen, Kai Chen, Kai Chen, Kai Chen, Kai Chen
Unverified author pages with similar names: Kai Chen
2026
ComfyFlow: Benchmarking LLMs for AIGC Workflow Generation
Zhenran Xu | Yiyu Wang | Yunxin li | Muyang Ye | Yangxue | Kai Chen | Longyue Wang | Weihua Luo | Baotian Hu | Min Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Zhenran Xu | Yiyu Wang | Yunxin li | Muyang Ye | Yangxue | Kai Chen | Longyue Wang | Weihua Luo | Baotian Hu | Min Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Large language models (LLMs) have shown promising advancements in tackling human-level tasks, wherein generating workflows for collaborative AI systems remains a critical and challenging step. To explore this frontier, we introduce ComfyFlow, a comprehensive benchmark to evaluate current LLMs’ ability to generate executable and instruction-following AIGC workflows in ComfyUI. The dataset includes 400 diverse visual generation tasks across 20 categories, supported by 10K training examples constructed from knowledge bases, which contain detailed annotations for 2,480 nodes and 3,298 workflows. We establish a systematic evaluation protocol that quantifies performance across multiple dimensions, ranging from basic format validity to multi-level hallucination rates. Our extensive evaluations show that: 1) ComfyFlow presents a substantial challenge even for top-tier proprietary LLMs such as GPT-5.1 and the Claude series; 2) Open-source models achieve new state-of-the-art results after post-training, yet struggle with long-horizon planning as the number of nodes increases; 3) Different post-training strategies offer complementary benefits in following instructions and mitigating hallucinations. By establishing both a challenging benchmark and a principled evaluation scheme, ComfyFlow lays the foundation for developing more intelligent and reliable collaborative AI systems.