Mechanistic Interpretability of Animacy Effects on Structure Choice in GPT-2

Yue Li, Yan Cong, Elaine J. Francis


Abstract
Language models (LMs) exhibit human-like behavior across linguistic tasks, yet behavioral similarity does not establish mechanistic correspondence. Animacy — whether an entity is alive and sentient — is a well-documented semantic feature shaping linguistic behavior in humans. Although LMs show animacy sensitivity behaviorally, the mechanistic basis remains unexplored. In this study, we probe GPT-2 Small’s internal circuitry to test whether animacy representations causally drive syntactic structure choice. Activation patching confirms causality: swapping animacy representations in the model shifts its downstream output. Critically, bidirectional patching reveals that animacy conditions differ in how strongly they commit to a structure: some animacy configurations resist perturbation and exert strong causal influence, while others remain flexible. We identify 22 attention heads mediating these effects, split between passive-promoting and passive-suppressing populations, suggesting GPT-2 Small’s structure choice likely emerges from internal competition between opposing heads. These findings provide mechanistic grounding for animacy effects documented in extensive psycholinguistics research and demonstrate how interpretability methods can enrich and test psycholinguistic theory.
Anthology ID:
2026.conll-main.38
Volume:
Proceedings of the 30th Conference on Computational Natural Language Learning
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Claire Bonial, Yevgeni Berzak
Venues:
CoNLL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
626–640
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.conll-main.38/
DOI:
Bibkey:
Cite (ACL):
Yue Li, Yan Cong, and Elaine J. Francis. 2026. Mechanistic Interpretability of Animacy Effects on Structure Choice in GPT-2. In Proceedings of the 30th Conference on Computational Natural Language Learning, pages 626–640, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
Mechanistic Interpretability of Animacy Effects on Structure Choice in GPT-2 (Li et al., CoNLL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.conll-main.38.pdf