Variance Sensitivity Induces Attention Entropy Collapse and Instability in Transformers

Jonghyun Hong, Sungyoon Lee


Abstract
Attention-based language models commonly rely on the softmax function to convert attention logits into probability distributions. However, this softmax re-weighting can lead to *attention entropy collapse*, in which attention disproportionately concentrates on a single token, ultimately causing training instability. In this work, we identify the high *variance sensitivity* of softmax as a primary cause of this collapse. We show that *entropy-stable* attention methods, which either control or are insensitive to the variance of attention logits, can prevent entropy collapse and enable more stable training. We provide empirical evidence of this effect in both large language models (LLMs) and a small Transformer model composed solely of self-attention and support our findings with theoretical analysis. Moreover, we identify that the concentration of attention probabilities increases the probability matrix norm, leading to the gradient exploding.
Anthology ID:
2025.emnlp-main.421
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8371–8389
Language:
URL:
https://preview.aclanthology.org/corrections-2025-11/2025.emnlp-main.421/
DOI:
10.18653/v1/2025.emnlp-main.421
Bibkey:
Cite (ACL):
Jonghyun Hong and Sungyoon Lee. 2025. Variance Sensitivity Induces Attention Entropy Collapse and Instability in Transformers. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 8371–8389, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Variance Sensitivity Induces Attention Entropy Collapse and Instability in Transformers (Hong & Lee, EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/corrections-2025-11/2025.emnlp-main.421.pdf
Checklist:
 2025.emnlp-main.421.checklist.pdf