Tiny-Attention Adapter: Contexts Are More Important Than the Number of Parameters

Hongyu Zhao; Hao Tan; Hongyuan Mei

doi:10.18653/v1/2022.emnlp-main.444

Tiny-Attention Adapter: Contexts Are More Important Than the Number of Parameters

Abstract

Adapter-tuning is a paradigm that transfers a pretrained language model to downstream tasks by adding and tuning a small number of new parameters. Previously proposed adapter architectures are all feed-forward neural networks. In this paper, we investigate the effectiveness of using tiny-attention—i.e., attention with extremely small per-head dimensionality—as adapters. Our tiny-attention adapter learns to modify the hidden states at each position directly conditioned on the hidden states at all the other positions, which is missed by the previously proposed adapters. Moreover, we view its multiple attention heads as a mixture of experts and propose to average their weights during deployment, which further reduces its inference computation cost. On the GLUE benchmark, our tiny-attention adapter outperforms the other parameter-efficient transfer learning methods as well as full fine-tuning while only updating 0.05% of the parameters. On the FewGLUE benchmark, its performance is comparable to that of GPT-3 and PET.

Anthology ID:: 2022.emnlp-main.444
Volume:: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates
Editors:: Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6626–6638
Language:
URL:: https://aclanthology.org/2022.emnlp-main.444
DOI:: 10.18653/v1/2022.emnlp-main.444
Bibkey:
Cite (ACL):: Hongyu Zhao, Hao Tan, and Hongyuan Mei. 2022. Tiny-Attention Adapter: Contexts Are More Important Than the Number of Parameters. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6626–6638, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: Tiny-Attention Adapter: Contexts Are More Important Than the Number of Parameters (Zhao et al., EMNLP 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-2/2022.emnlp-main.444.pdf
Software:: 2022.emnlp-main.444.software.zip

PDF Search Software