Reversed Attention: On The Gradient Descent Of Attention Layers In GPT

Shahar Katz; Lior Wolf

Reversed Attention: On The Gradient Descent Of Attention Layers In GPT

Abstract

The success of Transformer-based Language Models (LMs) stems from their attention mechanism. While this mechanism has been extensively studied in explainability research, particularly through the attention values obtained during the forward pass of LMs, the backward pass of attention has been largely overlooked.In this work, we study the mathematics of the backward pass of attention, revealing that it implicitly calculates an attention matrix we refer to as “Reversed Attention”.We visualized Reversed Attention and examine its properties, demonstrating its ability to elucidate the models’ behavior and edit dynamics.In an experimental setup, we showcase the ability of Reversed Attention to directly alter the forward pass of attention, without modifying the model’s weights, using a novel method called “attention patching”.In addition to enhancing the comprehension of how LMs configure attention layers during backpropagation, Reversed Attention maps contribute to a more interpretable backward pass.

Anthology ID:: 2025.naacl-long.52
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1125–1152
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2025.naacl-long.52/
DOI:
Bibkey:
Cite (ACL):: Shahar Katz and Lior Wolf. 2025. Reversed Attention: On The Gradient Descent Of Attention Layers In GPT. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 1125–1152, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Reversed Attention: On The Gradient Descent Of Attention Layers In GPT (Katz & Wolf, NAACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2025.naacl-long.52.pdf

PDF Cite Search Fix data