Abstract
In the past few years, pre-trained BERT has become one of the most popular deep-learning language models due to their remarkable performance in natural language processing (NLP) tasks. However, the superior performance of BERT comes at the cost of high computational and memory complexity, hindering its envisioned widespread deployment in edge devices with limited computing resources. Binarization can alleviate these limitations by reducing storage requirements and improving computing performance. However, obtaining a comparable accuracy performance for binary BERT w.r.t. its full-precision counterpart is still a difficult task. We observe that direct binarization of pre-trained BERT provides a poor initialization during the fine-tuning phase, making the model incapable of achieving a decent accuracy on downstream tasks. Based on this observation, we put forward the following hypothesis: partially randomly-initialized BERT with binary weights and activations can reach to a decent accuracy performance by distilling knowledge from the its full-precision counterpart. We show that BERT with pre-trained embedding layer and randomly-initialized encoder is a smoking gun for this hypothesis. We identify the smoking gun through a series of experiments and show that it yields a new set of state-of-the-art results on the GLUE and SQuAD benchmarks.- Anthology ID:
- 2022.findings-emnlp.191
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2022
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates
- Editors:
- Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2603–2612
- Language:
- URL:
- https://aclanthology.org/2022.findings-emnlp.191
- DOI:
- 10.18653/v1/2022.findings-emnlp.191
- Cite (ACL):
- Arash Ardakani. 2022. Partially-Random Initialization: A Smoking Gun for Binarization Hypothesis of BERT. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 2603–2612, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Cite (Informal):
- Partially-Random Initialization: A Smoking Gun for Binarization Hypothesis of BERT (Ardakani, Findings 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-3/2022.findings-emnlp.191.pdf