Kexin Chu


2025

pdf bib
PromptSculptor: Multi-Agent Based Text-to-Image Prompt Optimization
Dawei Xiang | Wenyan Xu | Kexin Chu | Tianqi Ding | Zixu Shen | Yiming Zeng | Jianchang Su | Wei Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

The rapid advancement of generative AI has democratized access to powerful tools such as Text-to-Image (T2I) models. However, to generate high-quality images, users must still craft detailed prompts specifying scene, style, and context—often through multiple rounds of refinement. We propose PromptSculptor, a novel multi-agent framework that automates this iterative prompt optimization process. Our system decomposes the task into four specialized agents that work collaboratively to transform a short, vague user prompt into a comprehensive, refined prompt. By leveraging Chain-of-Thought (CoT) reasoning, our framework effectively infers hidden context and enriches scene and background details. To iteratively refine the prompt, a self-evaluation agent aligns the modified prompt with the original input, while a feedback-tuning agent incorporates user feedback for further refinement. Experimental results demonstrate that PromptSculptor significantly enhances output quality and reduces the number of iterations needed for user satisfaction. Moreover, its model-agnostic design allows seamless integration with various T2I models, paving the way for industrial applications.