ReGLA: Refining Gated Linear Attention
Peng Lu, Ivan Kobyzev, Mehdi Rezagholizadeh, Boxing Chen, Philippe Langlais
Abstract
Recent advancements in Large Language Models (LLMs) have set themselves apart with their exceptional performance in complex language modelling tasks. However, these models are also known for their significant computational and storage requirements, primarily due to the quadratic computation complexity of softmax attention. To mitigate this issue, linear attention has been designed to reduce the quadratic space-time complexity that is inherent in standard transformers. In this work, we embarked on a comprehensive exploration of three key components that substantially impact the performance of the Gated Linear Attention module: feature maps, normalization, and the gating mechanism. We developed a feature mapping function to address some crucial issues that previous suggestions overlooked. Then we offered further rationale for the integration of normalization layers to stabilize the training process. Moreover, we explored the saturation phenomenon of the gating mechanism and augmented it with a refining module. We conducted extensive experiments and showed our architecture outperforms previous Gated Linear Attention mechanisms in extensive tasks including training from scratch and post-linearization with continual pre-training.- Anthology ID:
- 2025.naacl-long.147
- Volume:
- Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
- Month:
- April
- Year:
- 2025
- Address:
- Albuquerque, New Mexico
- Editors:
- Luis Chiruzzo, Alan Ritter, Lu Wang
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2884–2898
- Language:
- URL:
- https://preview.aclanthology.org/fix-sig-urls/2025.naacl-long.147/
- DOI:
- Cite (ACL):
- Peng Lu, Ivan Kobyzev, Mehdi Rezagholizadeh, Boxing Chen, and Philippe Langlais. 2025. ReGLA: Refining Gated Linear Attention. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 2884–2898, Albuquerque, New Mexico. Association for Computational Linguistics.
- Cite (Informal):
- ReGLA: Refining Gated Linear Attention (Lu et al., NAACL 2025)
- PDF:
- https://preview.aclanthology.org/fix-sig-urls/2025.naacl-long.147.pdf