Hanqing Yang
2026
Unified Thinker: A General Reasoning Core for Image Generation
Sashuai Zhou | Qiang Zhou | Jijin Hu | Hanqing Yang | Yue Cao | Junpeng Ma | Yinchao Ma | Jun Song | Tiezheng Ge | Cheng Yu | Bo Zheng | Zhou Zhao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Sashuai Zhou | Qiang Zhou | Jijin Hu | Hanqing Yang | Yue Cao | Junpeng Ma | Yinchao Ma | Jun Song | Tiezheng Ge | Cheng Yu | Bo Zheng | Zhou Zhao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Despite impressive progress in high-fidelity image synthesis, generative models still struggle with logic-intensive instruction following, exposing a persistent reasoning–execution gap. Meanwhile, closed-source systems (e.g., Nano Banana) have demonstrated strong reasoning-driven image generation, highlighting a substantial gap to current open-source models. We argue that closing this gap requires not merely better visual generators, but executable reasoning: decomposing high-level intents into grounded, verifiable plans that directly steer the generative process. To this end, we propose Unified Thinker, a task-agnostic reasoning architecture for general image generation, designed as a unified planning core that can plug into diverse generators and workflows. Unified Thinker decouples a dedicated Thinker from the image Generator, enabling modular upgrades of reasoning without retraining the entire generative model. We further introduce a two-stage training paradigm: we first build a structured planning interface for the Thinker, then apply reinforcement learning to ground its policy in pixel-level feedback, encouraging plans that optimize visual correctness over textual plausibility. Extensive experiments on text-to-image generation and image editing show that Unified Thinker substantially improves image reasoning and generation quality.