Bi-Drop: Enhancing Fine-tuning Generalization via Synchronous sub-net Estimation and Optimization
Shoujie Tong, Heming Xia, Damai Dai, Runxin Xu, Tianyu Liu, Binghuai Lin, Yunbo Cao, Zhifang Sui
Abstract
Pretrained language models have achieved remarkable success in natural language understanding. However, fine-tuning pretrained models on limited training data tends to overfit and thus diminish performance. This paper presents Bi-Drop, a fine-tuning strategy that selectively updates model parameters using gradients from various sub-nets dynamically generated by dropout. The sub-net estimation of Bi-Drop is performed in an in-batch manner, so it overcomes the problem of hysteresis in sub-net updating, which is possessed by previous methods that perform asynchronous sub-net estimation. Also, Bi-Drop needs only one mini-batch to estimate the sub-net so it achieves higher utility of training data. Experiments on the GLUE benchmark demonstrate that Bi-Drop consistently outperforms previous fine-tuning methods. Furthermore, empirical results also show that Bi-Drop exhibits excellent generalization ability and robustness for domain transfer, data imbalance, and low-resource scenarios.- Anthology ID:
- 2023.findings-emnlp.346
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2023
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Houda Bouamor, Juan Pino, Kalika Bali
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 5214–5227
- Language:
- URL:
- https://aclanthology.org/2023.findings-emnlp.346
- DOI:
- 10.18653/v1/2023.findings-emnlp.346
- Cite (ACL):
- Shoujie Tong, Heming Xia, Damai Dai, Runxin Xu, Tianyu Liu, Binghuai Lin, Yunbo Cao, and Zhifang Sui. 2023. Bi-Drop: Enhancing Fine-tuning Generalization via Synchronous sub-net Estimation and Optimization. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 5214–5227, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Bi-Drop: Enhancing Fine-tuning Generalization via Synchronous sub-net Estimation and Optimization (Tong et al., Findings 2023)
- PDF:
- https://preview.aclanthology.org/jeptaln-2024-ingestion/2023.findings-emnlp.346.pdf