Characterizing the Expressivity of Local Attention in Transformers

Jiaoda Li; Ryan Cotterell

Characterizing the Expressivity of Local Attention in Transformers

Abstract

The transformer is the most popular neural architecture for language modeling. The cornerstone of the transformer is its global attention mechanism, which lets the model aggregate information from all preceding tokens before generating the next token. One common variant of attention is called local attention, which restricts each token to aggregating information from a bounded window of predecessors, reducing the quadratic cost of global attention to linear. Although this restriction is usually motivated by efficiency, it has also been found to improve model quality, a phenomenon that has so far lacked a satisfactory explanation. We provide a formal account of this phenomenon in terms of recognizer expressivity. It has been shown that fixed-precision transformers with global attention correspond to a fragment of linear temporal logic containing a single past operator. We additionally prove that adding local attention introduces a second temporal operator, strictly enlarging the class of recognizable regular languages. Moreover, global and local attention are expressively complementary: neither subsumes the other, and combining them yields the richest fragment. Experiments on formal language recognition and natural language modeling corroborate the theory, showing that hybrid global–local transformers outperform their global-only counterparts.

Anthology ID:: 2026.acl-long.1739
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 37485–37507
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1739/
DOI:
Bibkey:
Cite (ACL):: Jiaoda Li and Ryan Cotterell. 2026. Characterizing the Expressivity of Local Attention in Transformers. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 37485–37507, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Characterizing the Expressivity of Local Attention in Transformers (Li & Cotterell, ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1739.pdf
Checklist:: 2026.acl-long.1739.checklist.pdf

PDF Cite Search Checklist Fix data