Yibo Zhong


2025

pdf bib
UORA: Uniform Orthogonal Reinitialization Adaptation in Parameter Efficient Fine-Tuning of Large Models
Xueyan Zhang | Jinman Zhao | Zhifei Yang | Yibo Zhong | Shuhao Guan | Linbo Cao | Yining Wang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

This paper introduces UoRA, a novel parameter-efficient fine-tuning (PEFT) approach for large language models (LLMs). UoRA achieves state-of-the-art efficiency by leveraging a low-rank approximation method that reduces the number of trainable parameters without compromising performance. Unlike existing methods such as LoRA and VeRA, UoRA employs a re-parametrization mechanism that eliminates the need to adapt frozen projection matrices while maintaining shared projection layers across the model. This results in halving the trainable parameters compared to LoRA and outperforming VeRA in computation and storage efficiency. Comprehensive experiments across various benchmarks demonstrate UoRA’s superiority in achieving competitive fine-tuning performance with minimal computational overhead. We demonstrate its performance on GLUE and E2E benchmarks and is effectiveness in instruction-tuning large language models and image classification models. Our contributions establish a new paradigm for scalable and resource-efficient fine-tuning of LLMs.

pdf bib
Low-Rank Interconnected Adaptation across Layers
Yibo Zhong | Jinman Zhao | Yao Zhou
Findings of the Association for Computational Linguistics: ACL 2025

Low-rank adaptation (LoRA) is a widely used parameter-efficient fine-tuning (PEFT) method that learns weight updates 𝛥 W = AB for pretrained weights W through low-rank adapters A and B. While LoRA ensures hardware efficiency, its low-rank weight updates limit adaptation performance. In this paper, we propose low-rank interconnected adaptation across layers (Lily), a novel PEFT method that introduces an interconnected framework with locally shared A and globally shared B experts. This structure eliminates redundant per-layer AB pairs, enabling higher-rank 𝛥 W with equal or fewer parameters. To enhance expressiveness, we use data-dependent routers to determine A-B interconnections, preventing B experts from converging to the same behavior and improving representational power across domains. Experiments across modalities, architectures, and model sizes demonstrate Lily’s superior performance and efficiency.