Junyou Su
2025
PlanGPT-VL: Enhancing Urban Planning with Domain-Specific Vision-Language Models
He Zhu
|
Junyou Su
|
Minxin Chen
|
Wen Wang
|
Yijie Deng
|
Guanhua Chen
|
Wenjia Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
In the field of urban planning, existing Vision-Language Models (VLMs) frequently fail to effectively analyze planning maps, which are critical for urban planners and educational contexts. Planning maps require specialized understanding of spatial configurations, regulatory requirements, and multi-scale analysis.To address this challenge, we introduce PlanGPT-VL, the first domain-specific VLM tailored for urban planning maps. PlanGPT-VL employs three innovations:(1) PlanAnno-V framework for high-quality VQA data synthesis,(2) Critical Point Thinking (CPT) to reduce hallucinations through structured verification, and(3) PlanBench-V benchmark for systematic evaluation.Evaluation on PlanBench-V shows that PlanGPT-VL outperforms general-purpose VLMs on planning map interpretation tasks, with our 7B model achieving performance comparable to larger 72B models.
Tag-Instruct: Controlled Instruction Complexity Enhancement through Structure-based Augmentation
He Zhu
|
Zhiwen Ruan
|
Junyou Su
|
Xingwei He
|
Yun Chen
|
Wenjia Zhang
|
Guanhua Chen
Findings of the Association for Computational Linguistics: ACL 2025
High-quality instruction data is crucial for developing large language models (LLMs), yet existing approaches struggle to effectively control instruction complexity. We present Tag-Instruct, a novel framework that enhances instruction complexity through structured semantic compression and controlled difficulty augmentation. Unlike previous prompt-based methods operating on raw text, Tag-Instruct compresses instructions into a compact tag space and systematically enhances complexity through RL-guided tag expansion. Through extensive experiments, we show that Tag-Instruct outperforms existing instruction complexity augmentation approaches. Our analysis reveals that operating in tag space provides superior controllability and stability across different instruction synthesis frameworks.