Xin Wang
Other people with similar names: Xin Eric Wang, Xin Wang, Xin Wang, Xin Wang, Xin Wang, Xin Wang, Xin Wang, Xin Wang, Xin Wang, Xin Wang, Xin Wang
Unverified author pages with similar names: Xin Wang
2026
Is the Attention Matrix Really the Key to Self-Attention in Multivariate Long-Term Time Series Forecasting?
Xinyu Li | Kexi Chen | Jiajie Shen | Ying Zheng | Hong Lu | Jin Zhao | Xin Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xinyu Li | Kexi Chen | Jiajie Shen | Ying Zheng | Hong Lu | Jin Zhao | Xin Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
In multivariate long-term time series forecasting, it is widely believed that the effectiveness of self-attention arises from its attention matrix. We challenge this assumption with a counterintuitive finding: our experiments, conducted on three classic and three latest Transformer models, show that dot-product attention can be replaced by element-wise operations without token interaction, such as the addition and Hadamard product, while maintaining or even improving accuracy. This leads to our central hypothesis: the effectiveness of self-attention in this task stems not from the dynamic attention matrix, but from the multi-branch feature extraction inherent in the parallel projections to Query, Key, and Value matrices and their fusion. To validate this, we construct a simple multi-branch MLP that isolates the ‘multi-branch mapping with element-wise operation’ structure from the Transformer and show that it achieves competitive performance. Our results indicate that the source of performance in self-attention has been misattributed, suggesting that the true benefit lies in the architectural principle of multi-branch mapping and fusion, not in the attention matrix.