Yu Shen
2026
From Pixels to Policies: Reinforcing Spatial Reasoning in Language Models for Content-Aware Layout Design
Sha Li | Stefano Petrangeli | Yu Shen | Xiang Chen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Sha Li | Stefano Petrangeli | Yu Shen | Xiang Chen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
We introduce LaySPA, a reinforcement learning framework that equips large language models (LLMs) with explicit and interpretable spatial reasoning for content-aware graphic layout design. LaySPA addresses two key challenges: LLMs’ limited spatial reasoning and the lack of transparency in design decision making. Instead of operating at the pixel level, we reformulate layout design as a policy learning problem over a structured textual spatial environment that explicitly encodes canvas geometry, element attributes, and inter-element relationships. LaySPA produces dual-level outputs comprising interpretable reasoning traces and structured layout specifications, enabling transparent and controllable design decision making. Layout design policy is optimized via a multi-objective spatial critique that decomposes layout quality into geometric validity, relational coherence, and aesthetic consistency, and is trained using relative group optimization to stabilize learning in open-ended design spaces. Experiments demonstrate that LaySPA improves structural validity and visual quality, outperforming larger proprietary LLMs and achieving performance comparable to specialized state-of-the-art layout generators while requiring fewer annotated samples.
2025
GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration
Yue Fan | Handong Zhao | Ruiyi Zhang | Yu Shen | Xin Eric Wang | Gang Wu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Yue Fan | Handong Zhao | Ruiyi Zhang | Yu Shen | Xin Eric Wang | Gang Wu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Graphical User Interface (GUI) action grounding, mapping language instructions to actionable elements on GUI screens, is important for assisting users in interactive tutorials, task automation, accessibility support, etc. Most recent works of GUI action grounding use large GUI datasets to fine-tune Multimodal Large Language Models (MLLMs). However, the fine-tuning data is inherently limited to specific GUI environments, leading to significant performance degradation in novel environments due to the generalization challenges in the GUI domain. Therefore, we argue that GUI action grounding models should be further aligned with novel environments before deployment to optimize their performance. To address this, we first propose GUI-Bee, an MLLM-based autonomous agent, to collect high-quality, environment-specific data through exploration and then continuously fine-tune GUI grounding models with the collected data. To ensure the GUI action grounding models generalize to various screens within the target novel environment after the continuous fine-tuning, we equip GUI-Bee with a novel Q-value-Incentive In-Context Reinforcement Learning (Q-ICRL) algorithm that optimizes exploration efficiency and exploration data quality. In the experiment, we introduce NovelScreenSpot to test how well the data can help align GUI action grounding models to novel environments. Furthermore, we conduct an ablation study to validate the Q-ICRL method in enhancing the efficiency of GUI-Bee.
2016
Cross-language Projection of Dependency Trees with Constrained Partial Parsing for Tree-to-Tree Machine Translation
Yu Shen | Chenhui Chu | Fabien Cromieres | Sadao Kurohashi
Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers
Yu Shen | Chenhui Chu | Fabien Cromieres | Sadao Kurohashi
Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers