Rethinking Task-Specific Knowledge Distillation: Contextualized Corpus as Better Textbook
Chang Liu, Chongyang Tao, Jianxin Liang, Tao Shen, Jiazhan Feng, Quzhe Huang, Dongyan Zhao
Abstract
Knowledge distillation has been proven effective when customizing small language models for specific tasks. Here, a corpus as ‘textbook’ plays an indispensable role, only through which the teacher can teach the student. Prevailing methods adopt a two-stage distillation paradigm: general distillation first with task-agnostic general corpus and task-specific distillation next with augmented task-specific corpus. We argue that such a paradigm may not be optimal. In general distillation, it’s extravagant to let the diverse but desultory general knowledge overwhelms the limited model capacity of the student. While in task-specific distillation, the task corpus is usually limited and narrow, preventing the student from learning enough knowledge. To mitigate the issues in the two gapped corpora, we present a better textbook for the student to learn: contextualized corpus that contextualizes task corpus with large-scale general corpus through relevance-based text retrieval. Experimental results on GLUE benchmark demonstrate that contextualized corpus is the better textbook compared with jointly using general corpus and augmented task-specific corpus. Surprisingly, it enables task-specific distillation from scratch without general distillation while maintaining comparable performance, making it more flexible to customize the student model with desired model size under various computation constraints.- Anthology ID:
- 2022.emnlp-main.729
- Volume:
- Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 10652–10658
- Language:
- URL:
- https://aclanthology.org/2022.emnlp-main.729
- DOI:
- Cite (ACL):
- Chang Liu, Chongyang Tao, Jianxin Liang, Tao Shen, Jiazhan Feng, Quzhe Huang, and Dongyan Zhao. 2022. Rethinking Task-Specific Knowledge Distillation: Contextualized Corpus as Better Textbook. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 10652–10658, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Cite (Informal):
- Rethinking Task-Specific Knowledge Distillation: Contextualized Corpus as Better Textbook (Liu et al., EMNLP 2022)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2022.emnlp-main.729.pdf