Partially-Random Initialization: A Smoking Gun for Binarization Hypothesis of BERT

Arash Ardakani


Abstract
In the past few years, pre-trained BERT has become one of the most popular deep-learning language models due to their remarkable performance in natural language processing (NLP) tasks. However, the superior performance of BERT comes at the cost of high computational and memory complexity, hindering its envisioned widespread deployment in edge devices with limited computing resources. Binarization can alleviate these limitations by reducing storage requirements and improving computing performance. However, obtaining a comparable accuracy performance for binary BERT w.r.t. its full-precision counterpart is still a difficult task. We observe that direct binarization of pre-trained BERT provides a poor initialization during the fine-tuning phase, making the model incapable of achieving a decent accuracy on downstream tasks. Based on this observation, we put forward the following hypothesis: partially randomly-initialized BERT with binary weights and activations can reach to a decent accuracy performance by distilling knowledge from the its full-precision counterpart. We show that BERT with pre-trained embedding layer and randomly-initialized encoder is a smoking gun for this hypothesis. We identify the smoking gun through a series of experiments and show that it yields a new set of state-of-the-art results on the GLUE and SQuAD benchmarks.
Anthology ID:
2022.findings-emnlp.191
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2022
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2603–2612
Language:
URL:
https://aclanthology.org/2022.findings-emnlp.191
DOI:
10.18653/v1/2022.findings-emnlp.191
Bibkey:
Cite (ACL):
Arash Ardakani. 2022. Partially-Random Initialization: A Smoking Gun for Binarization Hypothesis of BERT. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 2603–2612, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Partially-Random Initialization: A Smoking Gun for Binarization Hypothesis of BERT (Ardakani, Findings 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-3/2022.findings-emnlp.191.pdf