Why Are Positional Encodings Nonessential for Deep Autoregressive Transformers? A Petroglyph Revisited

Kazuki Irie


Abstract
Do autoregressive Transformer language models require explicit positional encodings (PEs)? The answer is ‘no’ provided they have more than one layer—they can distinguish sequences with permuted tokens without the need for explicit PEs. This follows from the fact that a cascade of (permutation invariant) set processors can collectively exhibit sequence-sensitive behavior in the autoregressive setting. This property has been known since early efforts (contemporary with GPT-2) adopting the Transformer for language modeling. However, this result does not appear to have been well disseminated, leading to recent rediscoveries. This may be partially due to a sudden growth of the language modeling community after the advent of GPT-2/3, but perhaps also due to the lack of a clear explanation in prior work, despite being commonly understood by practitioners in the past. Here we review the long-forgotten explanation why explicit PEs are nonessential for multi-layer autoregressive Transformers (in contrast, one-layer models require PEs to discern order information of their inputs), as well as the origin of this result, and hope to re-establish it as a common knowledge.
Anthology ID:
2025.findings-acl.30
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
551–559
Language:
URL:
https://preview.aclanthology.org/corrections-2025-08/2025.findings-acl.30/
DOI:
10.18653/v1/2025.findings-acl.30
Bibkey:
Cite (ACL):
Kazuki Irie. 2025. Why Are Positional Encodings Nonessential for Deep Autoregressive Transformers? A Petroglyph Revisited. In Findings of the Association for Computational Linguistics: ACL 2025, pages 551–559, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Why Are Positional Encodings Nonessential for Deep Autoregressive Transformers? A Petroglyph Revisited (Irie, Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/corrections-2025-08/2025.findings-acl.30.pdf