Beyond Position: the emergence of wavelet-like properties in Transformers

Valeria Ruscio, Umberto Nanni, Fabrizio Silvestri


Abstract
This paper studies how Transformer models with Rotary Position Embeddings (RoPE) develop emergent, wavelet-like properties that compensate for the positional encoding’s theoretical limitations. Through an analysis spanning model scales, architectures, and training checkpoints, we show that attention heads evolve to implement multi-resolution processing analogous to wavelet transforms. We demonstrate that this scale-invariant behavior is unique to RoPE, emerges through distinct evolutionary phases during training, and statistically adheres to the fundamental uncertainty principle. Our findings suggest that the effectiveness of modern Transformers stems from their remarkable ability to spontaneously develop optimal, multi-resolution decompositions to address inherent architectural constraints.
Anthology ID:
2025.acl-long.303
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6074–6088
Language:
URL:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2025.acl-long.303/
DOI:
10.18653/v1/2025.acl-long.303
Bibkey:
Cite (ACL):
Valeria Ruscio, Umberto Nanni, and Fabrizio Silvestri. 2025. Beyond Position: the emergence of wavelet-like properties in Transformers. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6074–6088, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Beyond Position: the emergence of wavelet-like properties in Transformers (Ruscio et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2025.acl-long.303.pdf