Abstract
The slow speed of BERT has motivated much research on accelerating its inference, and the early exiting idea has been proposed to make trade-offs between model quality and efficiency. This paper aims to address two weaknesses of previous work: (1) existing fine-tuning strategies for early exiting models fail to take full advantage of BERT; (2) methods to make exiting decisions are limited to classification tasks. We propose a more advanced fine-tuning strategy and a learning-to-exit module that extends early exiting to tasks other than classification. Experiments demonstrate improved early exiting for BERT, with better trade-offs obtained by the proposed fine-tuning strategy, successful application to regression tasks, and the possibility to combine it with other acceleration methods. Source code can be found at https://github.com/castorini/berxit.- Anthology ID:
- 2021.eacl-main.8
- Volume:
- Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume
- Month:
- April
- Year:
- 2021
- Address:
- Online
- Venue:
- EACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 91–104
- Language:
- URL:
- https://aclanthology.org/2021.eacl-main.8
- DOI:
- 10.18653/v1/2021.eacl-main.8
- Cite (ACL):
- Ji Xin, Raphael Tang, Yaoliang Yu, and Jimmy Lin. 2021. BERxiT: Early Exiting for BERT with Better Fine-Tuning and Extension to Regression. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 91–104, Online. Association for Computational Linguistics.
- Cite (Informal):
- BERxiT: Early Exiting for BERT with Better Fine-Tuning and Extension to Regression (Xin et al., EACL 2021)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2021.eacl-main.8.pdf
- Code
- castorini/berxit
- Data
- GLUE, QNLI, SICK