Learning Adaptive Axis Attentions in Fine-tuning: Beyond Fixed Sparse Attention Patterns

Zihan Wang; Jiuxiang Gu; Jason Kuen; Handong Zhao; Vlad Morariu; Ruiyi Zhang; Ani Nenkova; Tong Sun; Jingbo Shang

doi:10.18653/v1/2022.findings-acl.74

Learning Adaptive Axis Attentions in Fine-tuning: Beyond Fixed Sparse Attention Patterns

Zihan Wang, Jiuxiang Gu, Jason Kuen, Handong Zhao, Vlad Morariu, Ruiyi Zhang, Ani Nenkova, Tong Sun, Jingbo Shang

Abstract

We present a comprehensive study of sparse attention patterns in Transformer models. We first question the need for pre-training with sparse attention and present experiments showing that an efficient fine-tuning only approach yields a slightly worse but still competitive model. Then we compare the widely used local attention pattern and the less-well-studied global attention pattern, demonstrating that global patterns have several unique advantages. We also demonstrate that a flexible approach to attention, with different patterns across different layers of the model, is beneficial for some tasks. Drawing on this insight, we propose a novel Adaptive Axis Attention method, which learns—during fine-tuning—different attention patterns for each Transformer layer depending on the downstream task. Rather than choosing a fixed attention pattern, the adaptive axis attention method identifies important tokens—for each task and model layer—and focuses attention on those. It does not require pre-training to accommodate the sparse patterns and demonstrates competitive and sometimes better performance against fixed sparse attention patterns that require resource-intensive pre-training.

Anthology ID:: 2022.findings-acl.74
Volume:: Findings of the Association for Computational Linguistics: ACL 2022
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Editors:: Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 916–925
Language:
URL:: https://aclanthology.org/2022.findings-acl.74
DOI:: 10.18653/v1/2022.findings-acl.74
Bibkey:
Cite (ACL):: Zihan Wang, Jiuxiang Gu, Jason Kuen, Handong Zhao, Vlad Morariu, Ruiyi Zhang, Ani Nenkova, Tong Sun, and Jingbo Shang. 2022. Learning Adaptive Axis Attentions in Fine-tuning: Beyond Fixed Sparse Attention Patterns. In Findings of the Association for Computational Linguistics: ACL 2022, pages 916–925, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: Learning Adaptive Axis Attentions in Fine-tuning: Beyond Fixed Sparse Attention Patterns (Wang et al., Findings 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-2023-videos/2022.findings-acl.74.pdf
Video:: https://preview.aclanthology.org/ingest-acl-2023-videos/2022.findings-acl.74.mp4
Data: GLUE, LRA, QNLI

PDF Search Video