Multi-component Causal Tracing in Large Language Models

Zirui Yan, Dennis Wei, Dmitriy A Katz, Prasanna Sattigeri, Ali Tajer


Abstract
Causal tracing systematically intervenes on a large language model’s (LLM’s) internal representations to uncover and quantify the causal pathways linking specific inputs or computations to specific metrics of interest, quantifying the LLM’s behavior. Building on previous single-component or single-layer studies, this paper presents a unified framework for causally tracing multiple components simultaneously. This framework systematically identifies the subsets of components (e.g., attention heads and multi-layer perceptron neurons) most critical to a desired target performance metric (e.g., accuracy and fairness). This is achieved by incorporating flexible interventions applied to a wide range of desired metrics. To address the combinatorial complexity of the multi-component problem, an efficient algorithm is designed that leverages soft interventions and a carefully designed metric transformation, converting the combinatorial search problem into a continuous one that can be solved efficiently under proper constraints, thereby generating proper binary decisions for selecting components. Experimental results demonstrate that the proposed method efficiently identifies subsets of the model’s components that have a high impact on the target metric, outperforming existing baseline approaches.
Anthology ID:
2026.acl-long.154
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3398–3418
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.154/
DOI:
Bibkey:
Cite (ACL):
Zirui Yan, Dennis Wei, Dmitriy A Katz, Prasanna Sattigeri, and Ali Tajer. 2026. Multi-component Causal Tracing in Large Language Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3398–3418, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Multi-component Causal Tracing in Large Language Models (Yan et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.154.pdf
Checklist:
 2026.acl-long.154.checklist.pdf