Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering

Chenglei Si; Yanzhe Zhang; Ryan Li; Zhengyuan Yang; Ruibo Liu; Diyi Yang

doi:10.18653/v1/2025.naacl-long.199

Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering

Chenglei Si, Yanzhe Zhang, Ryan Li, Zhengyuan Yang, Ruibo Liu, Diyi Yang

Abstract

Generative AI has made rapid advancements in recent years, achieving unprecedented capabilities in multimodal understanding and code generation. This can enable a new paradigm of front-end development in which multimodal large language models (MLLMs) directly convert visual designs into code implementations. In this work, we construct Design2Code – the first real-world benchmark for this task. Specifically, we manually curate 484 diverse real-world webpages as test cases and develop a set of automatic evaluation metrics to assess how well current multimodal LLMs can generate the code implementations that directly render into the given reference webpages, given the screenshots as input. We also complement automatic metrics with comprehensive human evaluations to validate the performance ranking. To rigorously benchmark MLLMs, we test various multimodal prompting methods on frontier models such as GPT-4o, GPT-4V, Gemini, and Claude. Our fine-grained break-down metrics indicate that models mostly lag in recalling visual elements from the input webpages and generating correct layout designs.

Anthology ID:: 2025.naacl-long.199
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3956–3974
Language:
URL:: https://preview.aclanthology.org/moar-dois/2025.naacl-long.199/
DOI:: 10.18653/v1/2025.naacl-long.199
Bibkey:
Cite (ACL):: Chenglei Si, Yanzhe Zhang, Ryan Li, Zhengyuan Yang, Ruibo Liu, and Diyi Yang. 2025. Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 3956–3974, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering (Si et al., NAACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/moar-dois/2025.naacl-long.199.pdf

PDF Cite Search Fix data