Continuous-Time Attention: PDE-Guided Mechanisms for Long-Sequence Transformers

Yukun Zhang; Xueqing Zhou

doi:10.18653/v1/2025.emnlp-main.1097

Continuous-Time Attention: PDE-Guided Mechanisms for Long-Sequence Transformers

Abstract

We present Continuous-Time Attention, a novel framework that infuses partial differential equations (PDEs) into the Transformer’s attention mechanism to better handle long sequences. Instead of relying on a static attention matrix, we allow attention weights to evolve along a pseudo-time dimension governed by diffusion, wave, or reaction-diffusion dynamics. This dynamic process systematically smooths local noise, strengthens long-range dependencies, and improves gradient stability during training.Our theoretical analysis shows that PDE-driven attention mitigates the exponential decay of distant interactions and improves the optimization landscape. Empirically, Continuous-Time Attention achieves consistent performance gains over both standard and long-sequence Transformer variants across a range of tasks. These results suggest that embedding continuous-time dynamics into attention mechanisms is a promising direction for enhancing global coherence and scalability in Transformer models. Code is publicly available at:https://github.com/XueqingZhou/Continuous-Time-Attention

Anthology ID:: 2025.emnlp-main.1097
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 21654–21674
Language:
URL:: https://preview.aclanthology.org/name-variant-enfa-fane/2025.emnlp-main.1097/
DOI:: 10.18653/v1/2025.emnlp-main.1097
Bibkey:
Cite (ACL):: Yukun Zhang and Xueqing Zhou. 2025. Continuous-Time Attention: PDE-Guided Mechanisms for Long-Sequence Transformers. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 21654–21674, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Continuous-Time Attention: PDE-Guided Mechanisms for Long-Sequence Transformers (Zhang & Zhou, EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/name-variant-enfa-fane/2025.emnlp-main.1097.pdf
Checklist:: 2025.emnlp-main.1097.checklist.pdf

PDF Cite Search Checklist Fix data