CARD: Cross-modal Agent Framework for Generative and Editable Residential Design
Pengyu Zeng, Jun Yin, Miao Zhang, Yuqin Dai, Jizhizi Li, ZhanXiang Jin, Shuai Lu
Abstract
In recent years, architectural design automation has made significant progress, but the complexity of open-world environments continues to make residential design a challenging task, often requiring experienced architects to perform multiple iterations and human-computer interactions. Therefore, assisting ordinary users in navigating these complex environments to generate and edit residential design is crucial. In this paper, we present the CARD framework, which leverages a system of specialized cross-modal agents to adapt to complex open-world environments. The framework includes a point-based cross-modal information representation (CMI-P) that encodes the geometry and spatial relationships of residential rooms, a cross-modal residential generation model, supported by our customized Text2FloorEdit model, that acts as the lead designer to create standardized floor plans, and an embedded expert knowledge base for evaluating whether the designs meet user requirements and residential codes, providing feedback accordingly. Finally, a 3D rendering module assists users in visualizing and understanding the layout. CARD enables cross-modal residential generation from free-text input, empowering users to adapt to complex environments without requiring specialized expertise.- Anthology ID:
- 2025.emnlp-main.473
- Volume:
- Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 9315–9330
- Language:
- URL:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.473/
- DOI:
- Cite (ACL):
- Pengyu Zeng, Jun Yin, Miao Zhang, Yuqin Dai, Jizhizi Li, ZhanXiang Jin, and Shuai Lu. 2025. CARD: Cross-modal Agent Framework for Generative and Editable Residential Design. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 9315–9330, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- CARD: Cross-modal Agent Framework for Generative and Editable Residential Design (Zeng et al., EMNLP 2025)
- PDF:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.473.pdf