Abstract
Pre-trained language models like BERT are performant in a wide range of natural language tasks. However, they are resource exhaustive and computationally expensive for industrial scenarios. Thus, early exits are adopted at each layer of BERT to perform adaptive computation by predicting easier samples with the first few layers to speed up the inference. In this work, to improve efficiency without performance drop, we propose a novel training scheme called Learned Early Exit for BERT (LeeBERT). First, we ask each exit to learn from each other, rather than learning only from the last layer. Second, the weights of different loss terms are learned, thus balancing off different objectives. We formulate the optimization of LeeBERT as a bi-level optimization problem, and we propose a novel cross-level optimization (CLO) algorithm to improve the optimization results. Experiments on the GLUE benchmark show that our proposed methods improve the performance of the state-of-the-art (SOTA) early exit methods for pre-trained models.- Anthology ID:
- 2021.acl-long.231
- Volume:
- Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
- Month:
- August
- Year:
- 2021
- Address:
- Online
- Venues:
- ACL | IJCNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2968–2980
- Language:
- URL:
- https://aclanthology.org/2021.acl-long.231
- DOI:
- 10.18653/v1/2021.acl-long.231
- Cite (ACL):
- Wei Zhu. 2021. LeeBERT: Learned Early Exit for BERT with cross-level optimization. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 2968–2980, Online. Association for Computational Linguistics.
- Cite (Informal):
- LeeBERT: Learned Early Exit for BERT with cross-level optimization (Zhu, ACL-IJCNLP 2021)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2021.acl-long.231.pdf
- Data
- GLUE, QNLI