Sketch2Code: Evaluating Vision-Language Models for Interactive Web Design Prototyping

Ryan Li; Yanzhe Zhang; Diyi Yang

Sketch2Code: Evaluating Vision-Language Models for Interactive Web Design Prototyping

Abstract

Sketches are a natural and accessible medium for UI designers to conceptualize early-stage ideas. However, existing research on UI/UX automation often requires high-fidelity inputs like Figma designs or detailed screenshots, limiting accessibility and impeding efficient design iteration. To bridge this gap, we introduce Sketch2Code, a benchmark that evaluates state-of-the-art Vision Language Models (VLMs) on automating the conversion of rudimentary sketches into webpage prototypes. Beyond end-to-end benchmarking, Sketch2Code supports interactive agent evaluation that mimics real-world design workflows, where a VLM-based agent iteratively refines its generations by communicating with a simulated user, either passively receiving feedback instructions or proactively asking clarification questions. We comprehensively analyze ten commercial and open-source models, showing that Sketch2Code is challenging for existing VLMs; even the most capable models struggle to accurately interpret sketches and formulate effective questions that lead to steady improvement. Nevertheless, a user study with UI/UX experts reveals a significant preference for proactive question-asking over passive feedback reception, highlighting the need to develop more effective paradigms for multi-turn conversational assistants.

Anthology ID:: 2025.naacl-long.198
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3921–3955
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2025.naacl-long.198/
DOI:
Bibkey:
Cite (ACL):: Ryan Li, Yanzhe Zhang, and Diyi Yang. 2025. Sketch2Code: Evaluating Vision-Language Models for Interactive Web Design Prototyping. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 3921–3955, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Sketch2Code: Evaluating Vision-Language Models for Interactive Web Design Prototyping (Li et al., NAACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2025.naacl-long.198.pdf

PDF Cite Search Fix data