DisCo: Distilled Student Models Co-training for Semi-supervised Text Mining
Weifeng Jiang, Qianren Mao, Chenghua Lin, Jianxin Li, Ting Deng, Weiyi Yang, Zheng Wang
Abstract
Many text mining models are constructed by fine-tuning a large deep pre-trained language model (PLM) in downstream tasks. However, a significant challenge that arises nowadays is how to maintain performance when we use a lightweight model with limited labeled samples. We present DisCo, a semi-supervised learning (SSL) framework for fine-tuning a cohort of small student models generated from a large PLM using knowledge distillation. Our key insight is to share complementary knowledge among distilled student cohorts to promote their SSL effectiveness. DisCo employs a novel co-training technique to optimize a cohort of multiple small student models by promoting knowledge sharing among students under diversified views: model views produced by different distillation strategies and data views produced by various input augmentations. We evaluate DisCo on both semi-supervised text classification and extractive summarization tasks. Experimental results show that DisCo can produce student models that are 7.6× smaller and 4.8 × faster in inference than the baseline PLMs while maintaining comparable performance. We also show that DisCo-generated student models outperform the similar-sized models elaborately tuned in distinct tasks.- Anthology ID:
- 2023.emnlp-main.244
- Volume:
- Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Houda Bouamor, Juan Pino, Kalika Bali
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4015–4030
- Language:
- URL:
- https://aclanthology.org/2023.emnlp-main.244
- DOI:
- 10.18653/v1/2023.emnlp-main.244
- Cite (ACL):
- Weifeng Jiang, Qianren Mao, Chenghua Lin, Jianxin Li, Ting Deng, Weiyi Yang, and Zheng Wang. 2023. DisCo: Distilled Student Models Co-training for Semi-supervised Text Mining. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 4015–4030, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- DisCo: Distilled Student Models Co-training for Semi-supervised Text Mining (Jiang et al., EMNLP 2023)
- PDF:
- https://preview.aclanthology.org/fix-volume-bibkeys/2023.emnlp-main.244.pdf