Exploring Attention Attractors in Large Language Models

Ziheng Wang, Zihao Yue, Wenxuan Wang, Qin Jin


Abstract
This paper explores attention attractors, tokens that draw significantly high attention, in large language models. We analyze them from three perspectives: (1) Functionality: We demonstrate their role in aggregating information from preceding contexts to facilitate future predictions. (2) Distribution: Through layer-wise and token-wise analysis, we reveal that attention attractors are widely distributed across layers but predominantly originate from low-semantic words like "_the". (3) Mechanism: We demonstrate the correlation between attention weights allocated to tokens with their specific activation dimension values. We hope these findings provide new insights into the attention mechanisms of large language models and inspire further exploration.
Anthology ID:
2026.acl-long.51
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1148–1160
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.51/
DOI:
Bibkey:
Cite (ACL):
Ziheng Wang, Zihao Yue, Wenxuan Wang, and Qin Jin. 2026. Exploring Attention Attractors in Large Language Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1148–1160, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Exploring Attention Attractors in Large Language Models (Wang et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.51.pdf
Checklist:
 2026.acl-long.51.checklist.pdf