All for One: LLMs Solve Mental Math at the Last Token With Information Transferred From Other Tokens

Siddarth Mamidanna, Daking Rai, Ziyu Yao, Yilun Zhou


Abstract
Large language models (LLMs) demonstrate proficiency across numerous computational tasks, yet their inner workings remain unclear. In theory, the combination of causal self-attention and multilayer perceptron allows every token to access and compute information based on all preceding tokens. In practice, to what extent are such operations present? In this paper, on mental math tasks (i.e., direct math calculation via next-token prediction without explicit reasoning), we investigate this question in three steps: inhibiting input-specific token computations in the initial layers, restricting the routes of information transfer in the next few layers, and forcing all computation to happen at the last token in the remaining layers. With two proposed techniques, Context-Aware Mean Ablation (CAMA) and Attention-Based Peeking (ABP), we identify an All-for-One subgraph (AF1) with high accuracy on a wide variety of mental math tasks, where meaningful computation occurs very late (in terms of layer depth) and only at the last token, which receives information of other tokens in few specific layers. Experiments show that this circuit is sufficient and necessary for high model performance, transfers across different models, and works on a variety of input styles. Ablations on different CAMA and ABP alternatives reveal their unique advantages over other methods, which may be of independent interest.
Anthology ID:
2025.emnlp-main.1565
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
30735–30748
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1565/
DOI:
Bibkey:
Cite (ACL):
Siddarth Mamidanna, Daking Rai, Ziyu Yao, and Yilun Zhou. 2025. All for One: LLMs Solve Mental Math at the Last Token With Information Transferred From Other Tokens. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 30735–30748, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
All for One: LLMs Solve Mental Math at the Last Token With Information Transferred From Other Tokens (Mamidanna et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1565.pdf
Checklist:
 2025.emnlp-main.1565.checklist.pdf