Efficient Learning of Multiple NLP Tasks via Collective Weight Factorization on BERT

Christos Papadopoulos, Yannis Panagakis, Manolis Koubarakis, Mihalis Nicolaou


Abstract
The Transformer architecture continues to show remarkable performance gains in many Natural Language Processing tasks. However, obtaining such state-of-the-art performance in different tasks requires fine-tuning the same model separately for each task. Clearly, such an approach is demanding in terms of both memory requirements and computing power. In this paper, aiming to improve training efficiency across multiple tasks, we propose to collectively factorize the weighs of the multi-head attention module of a pre-trained Transformer. We test our proposed method on finetuning multiple natural language understanding tasks by employing BERT-Large as an instantiation of the Transformer and the GLUE as the evaluation benchmark. Experimental results show that our method requires training and storing only 1% of the initial model parameters for each task and matches or improves the original fine-tuned model’s performance for each task while effectively decreasing the parameter requirements by two orders of magnitude. Furthermore, compared to well-known adapter-based alternatives on the GLUE benchmark, our method consistently reaches the same levels of performance while requiring approximately four times fewer total and trainable parameters per task.
Anthology ID:
2022.findings-naacl.66
Volume:
Findings of the Association for Computational Linguistics: NAACL 2022
Month:
July
Year:
2022
Address:
Seattle, United States
Editors:
Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
882–890
Language:
URL:
https://aclanthology.org/2022.findings-naacl.66
DOI:
10.18653/v1/2022.findings-naacl.66
Bibkey:
Cite (ACL):
Christos Papadopoulos, Yannis Panagakis, Manolis Koubarakis, and Mihalis Nicolaou. 2022. Efficient Learning of Multiple NLP Tasks via Collective Weight Factorization on BERT. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 882–890, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
Efficient Learning of Multiple NLP Tasks via Collective Weight Factorization on BERT (Papadopoulos et al., Findings 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl24-info/2022.findings-naacl.66.pdf
Video:
 https://preview.aclanthology.org/naacl24-info/2022.findings-naacl.66.mp4
Data
CoLAGLUEMRPCMultiNLIQNLISSTSST-2