Abstract
We propose VE-KD, a novel method that balances knowledge distillation and vocabulary expansion with the aim of training efficient domain-specific language models. Compared with traditional pre-training approaches, VE-KD exhibits competitive performance in downstream tasks while reducing model size and using fewer computational resources. Additionally, VE-KD refrains from overfitting in domain adaptation. Our experiments with different biomedical domain tasks demonstrate that VE-KD performs well compared with models such as BioBERT (+1% at HoC) and PubMedBERT (+1% at PubMedQA), with about 96% less training time. Furthermore, it outperforms DistilBERT and Adapt-and-Distill, showing a significant improvement in document-level tasks. Investigation of vocabulary size and tolerance, which are hyperparameters of our method, provides insights for further model optimization. The fact that VE-KD consistently maintains its advantages, even when the corpus size is small, suggests that it is a practical approach for domain-specific language tasks and is transferrable to different domains for broader applications.- Anthology ID:
- 2024.findings-emnlp.884
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2024
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 15046–15059
- Language:
- URL:
- https://preview.aclanthology.org/add_missing_videos/2024.findings-emnlp.884/
- DOI:
- 10.18653/v1/2024.findings-emnlp.884
- Cite (ACL):
- Pengju Gao, Tomohiro Yamasaki, and Kazunori Imoto. 2024. VE-KD: Vocabulary-Expansion Knowledge-Distillation for Training Smaller Domain-Specific Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 15046–15059, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- VE-KD: Vocabulary-Expansion Knowledge-Distillation for Training Smaller Domain-Specific Language Models (Gao et al., Findings 2024)
- PDF:
- https://preview.aclanthology.org/add_missing_videos/2024.findings-emnlp.884.pdf