Abstract
Recently, Transformers have been widely used in various fields and have achieved remarkable results. But it is still difficult for Transformer-based models to process longer sequences because self-attention in them scales quadratically with the sequence length. Although some models attempt to use sparse attention to reduce computational complexity, hand-crafted attention patterns are unable to select useful tokens adaptively according to the context. Thus, in this paper, we propose a novel efficient Transformer model with adaptive attention, A2-Former, for long sequence modeling. It can select useful tokens automatically in sparse attention by learnable position vectors, which consist of meta position and offset position vectors. Because the learnable offset position is not an integer vector, we utilize the interpolation technique to gather corresponding vectors from the input embedding matrix by discrete indexes. Experiments on Long Range Arena (LRA), a systematic and unified benchmark with different tasks, show that our model has achieved further improvement in performance compared with other sparse-based Transformers.- Anthology ID:
- 2023.findings-acl.546
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2023
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Editors:
- Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 8602–8610
- Language:
- URL:
- https://aclanthology.org/2023.findings-acl.546
- DOI:
- 10.18653/v1/2023.findings-acl.546
- Cite (ACL):
- Xuanyu Zhang, Zhepeng Lv, and Qing Yang. 2023. Adaptive Attention for Sparse-based Long-sequence Transformer. In Findings of the Association for Computational Linguistics: ACL 2023, pages 8602–8610, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- Adaptive Attention for Sparse-based Long-sequence Transformer (Zhang et al., Findings 2023)
- PDF:
- https://preview.aclanthology.org/landing_page/2023.findings-acl.546.pdf