Yue Li

Other people with similar names: Yue Li

Unverified author pages with similar names: Yue Li

2026

Mechanistic Interpretability of Animacy Effects on Structure Choice in GPT-2
Yue Li | Yan Cong | Elaine J. Francis
Proceedings of the 30th Conference on Computational Natural Language Learning

Language models (LMs) exhibit human-like behavior across linguistic tasks, yet behavioral similarity does not establish mechanistic correspondence. Animacy — whether an entity is alive and sentient — is a well-documented semantic feature shaping linguistic behavior in humans. Although LMs show animacy sensitivity behaviorally, the mechanistic basis remains unexplored. In this study, we probe GPT-2 Small’s internal circuitry to test whether animacy representations causally drive syntactic structure choice. Activation patching confirms causality: swapping animacy representations in the model shifts its downstream output. Critically, bidirectional patching reveals that animacy conditions differ in how strongly they commit to a structure: some animacy configurations resist perturbation and exert strong causal influence, while others remain flexible. We identify 22 attention heads mediating these effects, split between passive-promoting and passive-suppressing populations, suggesting GPT-2 Small’s structure choice likely emerges from internal competition between opposing heads. These findings provide mechanistic grounding for animacy effects documented in extensive psycholinguistics research and demonstrate how interpretability methods can enrich and test psycholinguistic theory.

2025

pdf bib abs

Beyond Binary Animacy: A Multi-Method Investigation of LMs’ Sensitivity in English Object Relative Clauses
Yue Li | Yan Cong | Elaine J. Francis
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

Animacy is a well-documented factor affecting language production, but its influence on Language Models (LMs) in complex structures like Object Relative Clauses (ORCs) remains underexplored. This study examines LMs’ sensitivity to animacy in English ORC structure choice (passive vs. active) using surprisal-based and prompting-based analyses, alongside human baselines. In surprisal-based analysis, DistilGPT-2 best mirrored human preferences, while GPT-Neo and BERT-base showed rigid biases, diverging from human patterns. Prompting-based analysis expanded testing to GPT-4o-mini, Gemini models, and DeepSeek-R1, revealing GPT-4o-mini’s stronger human alignment but limited animacy sensitivity in Gemini models and DeepSeek-R1. Some LMs exhibited inconsistencies between analyses, reinforcing that prompting alone is unreliable for assessing linguistic competence. Corpus analysis confirmed that training data alone cannot fully explain animacy sensitivity, suggesting emergent animacy-aware representations. These findings underscore the interaction between training data, model architecture, and linguistic generalization, highlighting the need for integrating structured linguistic knowledge into LMs to enhance their alignment with human sentence processing mechanisms.

Co-authors

Yan Cong 2
Elaine J. Francis 2

Venues

Fix author