Zouying Cao


2025

pdf bib
LESA: Learnable LLM Layer Scaling-Up
Yifei Yang | Zouying Cao | Xinbei Ma | Yao Yao | Zhi Chen | Libo Qin | Hai Zhao
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Training Large Language Models (LLMs) from scratch requires immense computational resources, making it prohibitively expensive. Model scaling-up offers a promising solution by leveraging the parameters of smaller models to create larger ones. However, existing depth scaling-up methods rely on empirical heuristic rules for layer duplication, which result in poorer initialization and slower convergence during continual pre-training. We propose LESA, a novel learnable method for depth scaling-up. By concatenating parameters from each layer and applying Singular Value Decomposition, we uncover latent patterns between layers, suggesting that inter-layer parameters can be learned. LESA uses a neural network to predict the parameters inserted between adjacent layers, enabling better initialization and faster training. Experiments show that LESA outperforms existing baselines, achieving superior performance with less than half the computational cost during continual pre-training. Extensive analyses demonstrate its effectiveness across different model sizes and tasks.

pdf bib
PGPO: Enhancing Agent Reasoning via Pseudocode-style Planning Guided Preference Optimization
Zouying Cao | Runze Wang | Yifei Yang | Xinbei Ma | Xiaoyong Zhu | Bo Zheng | Hai Zhao
Findings of the Association for Computational Linguistics: ACL 2025

Large Language Model (LLM) agents have demonstrated impressive capabilities in handling complex interactive problems. Existing LLM agents mainly generate natural language plans to guide reasoning, which is verbose and inefficient. NL plans are also tailored to specific tasks and restrict agents’ ability to generalize across similar tasks. To this end, we explore pseudocode-style plans (P-code Plan) to capture the structural logic of reasoning. We find that P-code Plan empowers LLM agents with stronger generalization ability and more efficiency. Inspired by this finding, we propose a pseudocode-style  ̲Planning  ̲Guided  ̲Preference  ̲Optimization method called PGPO for effective agent learning. With two planning-oriented rewards, PGPO further enhances LLM agents’ ability to generate high-quality P-code Plans and subsequent reasoning. Experiments show that PGPO achieves superior performance on representative agent benchmarks and outperforms the current leading baselines. Analyses reveal the advantage of PGPO in reducing action errors and omissions during reasoning.

2024

pdf bib
Head-wise Shareable Attention for Large Language Models
Zouying Cao | Yifei Yang | Hai Zhao
Findings of the Association for Computational Linguistics: EMNLP 2024

pdf bib
LaCo: Large Language Model Pruning via Layer Collapse
Yifei Yang | Zouying Cao | Hai Zhao
Findings of the Association for Computational Linguistics: EMNLP 2024