Challenging Quadratic Attention - A Holistic View On the Rise of Alternative Language Model Architectures

Alexander M. Fichtl, Jeremias Bohn, Josefin Kelber, Edoardo Mosca, Georg Groh


Abstract
Transformers have dominated sequence processing tasks for the past seven years—most notably language modeling. However, the inherent quadratic complexity of their attention mechanism remains a significant bottleneck as context length increases. We review and distill the recent efforts to overcome this bottleneck, including advances in (sub-quadratic) attention variants, recurrent neural networks, state space models, and hybrid architectures. We critically analyze approaches regarding compute and memory complexity, benchmark results, and fundamental limitations to assess whether the dominance of pure-attention transformers may soon be challenged, which we consider possible, particularly in domain-specific and edge-device applications.
Anthology ID:
2026.bigpicture-main.6
Volume:
Proceedings of The Big Picture v2: Crafting a Research Narrative
Month:
July
Year:
2026
Address:
San Diego, CA, USA
Editors:
Yanai Elazar, Allyson Ettinger, Nora Kassner, Sebastian Ruder
Venues:
BigPicture | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
60–81
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.bigpicture-main.6/
DOI:
Bibkey:
Cite (ACL):
Alexander M. Fichtl, Jeremias Bohn, Josefin Kelber, Edoardo Mosca, and Georg Groh. 2026. Challenging Quadratic Attention - A Holistic View On the Rise of Alternative Language Model Architectures. In Proceedings of The Big Picture v2: Crafting a Research Narrative, pages 60–81, San Diego, CA, USA. Association for Computational Linguistics.
Cite (Informal):
Challenging Quadratic Attention - A Holistic View On the Rise of Alternative Language Model Architectures (Fichtl et al., BigPicture 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.bigpicture-main.6.pdf