Abstract
Fine-tuning all parameters of large language models (LLMs) requires significant computational resources and is time-consuming. Recent parameter-efficient tuning methods such as Adapter tuning, Prefix tuning, and LoRA allow for updating a small subset of parameters in large language models. However, they can only save approximately 30% of the training memory requirements, due to the problem that gradient computation and backpropagation are still necessary for these methods. This paper proposes a novel parameter-efficient tuning method for LLMs without calculating their gradients. Leveraging the discernible similarities between the parameter-efficient modules of the same task learned by both large and small language models, we put forward a strategy for transferring the parameter-efficient modules, originally derived from small language models to much larger ones. To ensure a smooth and effective adaptation process, we further introduce a Bridge model to guarantee dimensional consistency while also stimulating a dynamic interaction between the models. We demonstrate the effectiveness of our method using the T5 and GPT-2 series of language models on the SuperGLUE benchmark. Our method achieves comparable performance to both fine-tuning and parameter-efficient tuning on large language models without needing gradient-based optimization. Additionally, our method achieves up to 5.7x memory reduction compared to parameter-efficient tuning.- Anthology ID:
- 2023.emnlp-main.22
- Volume:
- Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Houda Bouamor, Juan Pino, Kalika Bali
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 321–330
- Language:
- URL:
- https://aclanthology.org/2023.emnlp-main.22
- DOI:
- 10.18653/v1/2023.emnlp-main.22
- Cite (ACL):
- Feihu Jin, Jiajun Zhang, and Chengqing Zong. 2023. Parameter-efficient Tuning for Large Language Model without Calculating Its Gradients. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 321–330, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Parameter-efficient Tuning for Large Language Model without Calculating Its Gradients (Jin et al., EMNLP 2023)
- PDF:
- https://preview.aclanthology.org/emnlp-22-attachments/2023.emnlp-main.22.pdf