Federated Data-Efficient Instruction Tuning for Large Language Models

Zhen Qin; Zhaomin Wu; Bingsheng He; Shuiguang Deng

Federated Data-Efficient Instruction Tuning for Large Language Models

Zhen Qin, Zhaomin Wu, Bingsheng He, Shuiguang Deng

Abstract

Instruction tuning is a crucial step in improving the responsiveness of pretrained large language models (LLMs) to human instructions. Federated learning (FL) helps to exploit the use of vast private instruction data from clients, becoming popular for LLM tuning by improving data diversity. Existing federated tuning simply consumes all local data, causing excessive computational overhead and overfitting to local data, while centralized data-efficient solutions are not suitable for FL due to privacy concerns. This work presents FedHDS, a federated data-efficient instruction tuning approach, which tunes LLMs with a representative subset of edge-side data. It reduces the data redundancy at both intra- and inter-client levels without sharing raw data. Experiments with various LLMs, datasets and partitions show that FedHDS improves Rouge-L on unseen tasks by an average of 10.72% over the SOTA full-data federated instruction tuning methods, while using less than 1.5% of the data samples, improving training efficiency by up to tens of times.

Anthology ID:: 2025.findings-acl.803
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 15550–15568
Language:
URL:: https://preview.aclanthology.org/display_plenaries/2025.findings-acl.803/
DOI:
Bibkey:
Cite (ACL):: Zhen Qin, Zhaomin Wu, Bingsheng He, and Shuiguang Deng. 2025. Federated Data-Efficient Instruction Tuning for Large Language Models. In Findings of the Association for Computational Linguistics: ACL 2025, pages 15550–15568, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Federated Data-Efficient Instruction Tuning for Large Language Models (Qin et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/display_plenaries/2025.findings-acl.803.pdf

PDF Cite Search Fix data