Probing the Attention Representation of Filler-Gap Dependency in Transformers

Ruoqing Yao, Pranav Anand


Abstract
Prior work (Wilcox et al, 2024; Kobzeva et al., 2025) shows that neural language models exhibit filled-gap and unlicensed-gap effects, yet these effects attenuate with intervening clauses, especially with intervening overt complementizers. We conduct attention probing experiments on GPT-2 and identify two specific heads (layer 5, head 2, and layer 8, head 9) whose verb-to-filler attention correlates with filled-gap surprisal. The two heads are sensitive to clausal intervention but not to linear distance, and they show distinct patterns in islands. When intervening overt complementizers appear, head 2 of layer 5’s attention redistributes from the filler to the nearest complementizer, producing an “attend-closest-C” pattern, while head 9 of layer 8 does not. These results may suggest that LMs may have allocated distinct linguistically meaningful representations from the training data to individual attention heads, but they fail to fully learn the correct grammars of FGDs.
Anthology ID:
2026.scil-main.23
Volume:
Proceedings of the Society for Computation in Linguistics 2026
Month:
July
Year:
2026
Address:
San Diego, CA
Editors:
Rob Voigt, Alex Warstadt, Naomi Feldman, Tal Linzen
Venues:
SCiL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
258–261
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.scil-main.23/
DOI:
Bibkey:
Cite (ACL):
Ruoqing Yao and Pranav Anand. 2026. Probing the Attention Representation of Filler-Gap Dependency in Transformers. In Proceedings of the Society for Computation in Linguistics 2026, pages 258–261, San Diego, CA. Association for Computational Linguistics.
Cite (Informal):
Probing the Attention Representation of Filler-Gap Dependency in Transformers (Yao & Anand, SCiL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.scil-main.23.pdf