Adder Encoder for Pre-trained Language Model

Ding Jianbang; Zhang Suiyun; Li Linlin

Adder Encoder for Pre-trained Language Model

Abstract

“BERT, a pre-trained language model entirely based on attention, has proven to be highly per-formant for many natural language understanding tasks. However, pre-trained language mod-els (PLMs) are often computationally expensive and can hardly be implemented with limitedresources. To reduce energy burden, we introduce adder operations into the Transformer en-coder and propose a novel AdderBERT with powerful representation capability. Moreover, weadopt mapping-based distillation to further improve its energy efficiency with an assured perfor-mance. Empirical results demonstrate that AddderBERT6 achieves highly competitive perfor-mance against that of its teacher BERTBASE on the GLUE benchmark while obtaining a 4.9xreduction in energy consumption.”

Anthology ID:: 2023.ccl-1.76
Volume:: Proceedings of the 22nd Chinese National Conference on Computational Linguistics
Month:: August
Year:: 2023
Address:: Harbin, China
Editors:: Maosong Sun, Bing Qin, Xipeng Qiu, Jing Jiang, Xianpei Han
Venue:: CCL
SIG:
Publisher:: Chinese Information Processing Society of China
Note:
Pages:: 898–905
Language:: English
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2023.ccl-1.76/
DOI:
Bibkey:
Cite (ACL):: Ding Jianbang, Zhang Suiyun, and Li Linlin. 2023. Adder Encoder for Pre-trained Language Model. In Proceedings of the 22nd Chinese National Conference on Computational Linguistics, pages 898–905, Harbin, China. Chinese Information Processing Society of China.
Cite (Informal):: Adder Encoder for Pre-trained Language Model (Jianbang et al., CCL 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2023.ccl-1.76.pdf

PDF Cite Search Fix data