Xiaokun Yang

2026

Large language models (LLMs) for code generation have achieved remarkable progress in synthesizing functional code from natural language instructions. However, a critical challenge persists in generating visually accurate and structurally sound front-end code that faithfully renders user-intended layouts and interfaces. Most existing works focus primarily on functional correctness, overlooking the visual fidelity and rendering quality essential for front-end development. To address this gap, we present a comprehensive data construction and training pipeline to enhance front-end code generation capabilities in code LLMs. We use a three-stage training approach: continual pre-training on synthetic data, quality-controlled supervised fine-tuning, and reinforcement learning with checklist-based rewards to improve model performance. Our comprehensive evaluation on front-end code generation benchmarks reveals that even strong base models struggle with visual faithfulness and layout complexity. Our fully-trained model demonstrated substantial improvements over baseline approaches across all domains, achieving competitive performance with frontier models while maintaining generation efficiency, underscoring the critical importance of stage-aligned data curation and vision-grounded optimization in developing reliable front-end code generation systems. Our code and data are open-sourced at https://github.com/leanfeng1/FrontCoder.

Co-authors

Yihang Lou 1

Jing Wang 1

Jian Yang 1

Wei Zhang 1

Venues

Findings1

Fix author