Adaptive Attention for Sparse-based Long-sequence Transformer

Xuanyu Zhang; Zhepeng Lv; Qing Yang

doi:10.18653/v1/2023.findings-acl.546

Adaptive Attention for Sparse-based Long-sequence Transformer

Abstract

Recently, Transformers have been widely used in various fields and have achieved remarkable results. But it is still difficult for Transformer-based models to process longer sequences because self-attention in them scales quadratically with the sequence length. Although some models attempt to use sparse attention to reduce computational complexity, hand-crafted attention patterns are unable to select useful tokens adaptively according to the context. Thus, in this paper, we propose a novel efficient Transformer model with adaptive attention, A2-Former, for long sequence modeling. It can select useful tokens automatically in sparse attention by learnable position vectors, which consist of meta position and offset position vectors. Because the learnable offset position is not an integer vector, we utilize the interpolation technique to gather corresponding vectors from the input embedding matrix by discrete indexes. Experiments on Long Range Arena (LRA), a systematic and unified benchmark with different tasks, show that our model has achieved further improvement in performance compared with other sparse-based Transformers.

Anthology ID:: 2023.findings-acl.546
Volume:: Findings of the Association for Computational Linguistics: ACL 2023
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 8602–8610
Language:
URL:: https://aclanthology.org/2023.findings-acl.546
DOI:: 10.18653/v1/2023.findings-acl.546
Bibkey:
Cite (ACL):: Xuanyu Zhang, Zhepeng Lv, and Qing Yang. 2023. Adaptive Attention for Sparse-based Long-sequence Transformer. In Findings of the Association for Computational Linguistics: ACL 2023, pages 8602–8610, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Adaptive Attention for Sparse-based Long-sequence Transformer (Zhang et al., Findings 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2023.findings-acl.546.pdf

PDF Search