Wei Xi
2025
Enhancing User-Controlled Text-to-Image Generation with Layout-Aware Personalization
Hongliang Luo
|
Wei Xi
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recent diffusion-based models have advanced text-to-image synthesis, yet struggle to preserve fine visual details and enable precise spatial control in personalized content. We propose **LayoutFlex**, a novel framework that combines a Perspective-Adaptive Feature Extraction system with a Spatial Control Mechanism. Our approach captures fine-grained details via cross-modal representation learning and attention refinement, while enabling precise subject placement through coordinate-aware attention and region-constrained optimization. Experiments show LayoutFlex outperforms prior methods in visual fidelity (DINO ↑10.8%) and spatial accuracy (AP 43.1±1.2 vs. 19.3). LayoutFlex supports both single and multi-subject personalization, offering a powerful solution for controllable and coherent image generation in creative and interactive applications.