Interpretability Analysis of Arithmetic In-Context Learning in Large Language Models

Gregory Polyakov, Christian Hepting, Carsten Eickhoff, Seyed Ali Bahrainian


Abstract
Large language models (LLMs) exhibit sophisticated behavior, notably solving arithmetic with only a few in-context examples (ICEs). Yet the computations that connect those examples to the answer remain opaque. We probe four open-weight LLMs, Pythia-12B, Llama-3.1-8B, MPT-7B, and OPT-6.7B, on basic arithmetic to illustrate how they process ICEs. Our study integrates activation patching, information-flow analysis, automatic circuit discovery, and the logit-lens perspective into a unified pipeline. Within this framework we isolate partial-sum representations in three-operand tasks, investigate their influence on final logits, and derive linear function vectors that characterize tasks and align with ICE-induced activations. Controlled ablations show that strict pattern consistency in the formatting of ICEs guides the models more strongly than the symbols chosen or even the factual correctness of the examples. By unifying four complementary interpretability tools, this work delivers one of the most comprehensive interpretability studies of LLM arithmetic to date, and the first on three-operand tasks. Our code is publicly available.
Anthology ID:
2025.emnlp-main.92
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1758–1777
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.92/
DOI:
Bibkey:
Cite (ACL):
Gregory Polyakov, Christian Hepting, Carsten Eickhoff, and Seyed Ali Bahrainian. 2025. Interpretability Analysis of Arithmetic In-Context Learning in Large Language Models. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 1758–1777, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Interpretability Analysis of Arithmetic In-Context Learning in Large Language Models (Polyakov et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.92.pdf
Checklist:
 2025.emnlp-main.92.checklist.pdf