PCEE-BERT: Accelerating BERT Inference via Patient and Confident Early Exiting
Zhen Zhang, Wei Zhu, Jinfan Zhang, Peng Wang, Rize Jin, Tae-Sun Chung
Abstract
BERT and other pretrained language models (PLMs) are ubiquitous in modern NLP. Even though PLMs are the state-of-the-art (SOTA) models for almost every NLP task (CITATION), the significant latency during inference prohibits wider industrial usage. In this work, we propose Patient and Confident Early Exiting BERT (PCEE-BERT), an off-the-shelf sample-dependent early exiting method that can work with different PLMs and can also work along with popular model compression methods. With a multi-exit BERT as the backbone model, PCEE-BERT will make the early exiting decision if enough numbers (patience parameter) of consecutive intermediate layers are confident about their predictions. The entropy value measures the confidence level of an intermediate layer’s prediction. Experiments on the GLUE benchmark demonstrate that our method outperforms previous SOTA early exiting methods. Ablation studies show that: (a) our method performs consistently well on other PLMs, such as ALBERT and TinyBERT; (b) PCEE-BERT can achieve different speed-up ratios by adjusting the patience parameter and the confidence threshold. The code for PCEE-BERT can be found at https://github.com/michael-wzhu/PCEE-BERT.- Anthology ID:
- 2022.findings-naacl.25
- Volume:
- Findings of the Association for Computational Linguistics: NAACL 2022
- Month:
- July
- Year:
- 2022
- Address:
- Seattle, United States
- Editors:
- Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 327–338
- Language:
- URL:
- https://aclanthology.org/2022.findings-naacl.25
- DOI:
- 10.18653/v1/2022.findings-naacl.25
- Cite (ACL):
- Zhen Zhang, Wei Zhu, Jinfan Zhang, Peng Wang, Rize Jin, and Tae-Sun Chung. 2022. PCEE-BERT: Accelerating BERT Inference via Patient and Confident Early Exiting. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 327–338, Seattle, United States. Association for Computational Linguistics.
- Cite (Informal):
- PCEE-BERT: Accelerating BERT Inference via Patient and Confident Early Exiting (Zhang et al., Findings 2022)
- PDF:
- https://preview.aclanthology.org/ingest-acl-2023-videos/2022.findings-naacl.25.pdf
- Code
- michael-wzhu/pcee-bert
- Data
- CIFAR-10, CIFAR-100, GLUE, QNLI