Abstract
Fine-tuning a pre-trained language model using annotated data has become the de-facto standard for adapting general-purpose pre-trained models like BERT to downstream tasks. However, given the trend of larger pre-trained models, fine-tuning these models for each downstream task is parameter-inefficient and computationally-expensive deeming this approach sub-optimal for adoption by NLU systems. In recent years, various approaches have been proposed for parameter efficient task adaptation such as Adaptor, Bitfit, Prompt tuning, Prefix tuning etc. However, most of these efforts propose to insert task specific parameters in-between or inside intermediate layers of the pre-trained encoder resulting in higher computational cost due to back-propagation of errors to all layers. To mitigate this issue, we propose a light but efficient, attention based fusion module which computes task-attuned token representations by aggregating intermediate layer representations from a pre-trained network. Our proposed fusion module trains only 0.0009% of total parameters and achieves competitive performance to the standard fine-tuning approach on various tasks. It is also decoupled from the pre-trained network making it efficient during computation and scalable during deployment. Last but not the least, we demonstrate that our proposed attention-fusion mechanism can transfer effectively to different languages for further re-use and expansion.- Anthology ID:
- 2022.findings-naacl.64
- Volume:
- Findings of the Association for Computational Linguistics: NAACL 2022
- Month:
- July
- Year:
- 2022
- Address:
- Seattle, United States
- Editors:
- Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 857–866
- Language:
- URL:
- https://preview.aclanthology.org/icon-24-ingestion/2022.findings-naacl.64/
- DOI:
- 10.18653/v1/2022.findings-naacl.64
- Cite (ACL):
- Jin Cao, Chandana Satya Prakash, and Wael Hamza. 2022. Attention Fusion: a light yet efficient late fusion mechanism for task adaptation in NLU. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 857–866, Seattle, United States. Association for Computational Linguistics.
- Cite (Informal):
- Attention Fusion: a light yet efficient late fusion mechanism for task adaptation in NLU (Cao et al., Findings 2022)
- PDF:
- https://preview.aclanthology.org/icon-24-ingestion/2022.findings-naacl.64.pdf
- Data
- CoNLL 2003, GLUE, QNLI