Do All Autoregressive Transformers Remember Facts the Same Way? A Cross-Architecture Analysis of Recall Mechanisms

Minyeong Choe; Haehyun Cho; Changho Seo; Hyunil Kim

doi:10.18653/v1/2025.emnlp-main.1448

Do All Autoregressive Transformers Remember Facts the Same Way? A Cross-Architecture Analysis of Recall Mechanisms

Minyeong Choe, Haehyun Cho, Changho Seo, Hyunil Kim

Abstract

Understanding how Transformer-based language models store and retrieve factual associations is critical for improving interpretability and enabling targeted model editing. Prior work, primarily on GPT-style models, has identified MLP modules in early layers as key contributors to factual recall. However, it remains unclear whether these findings generalize across different autoregressive architectures. To address this, we conduct a comprehensive evaluation of factual recall across several models—including GPT, LLaMA, Qwen, and DeepSeek—analyzing where and how factual information is encoded and accessed. Consequently, we find that Qwen-based models behave differently from previous patterns: attention modules in the earliest layers contribute more to factual recall than MLP modules. Our findings suggest that even within the autoregressive Transformer family, architectural variations can lead to fundamentally different mechanisms of factual recall.

Anthology ID:: 2025.emnlp-main.1448
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 28482–28501
Language:
URL:: https://preview.aclanthology.org/name-variant-enfa-fane/2025.emnlp-main.1448/
DOI:: 10.18653/v1/2025.emnlp-main.1448
Bibkey:
Cite (ACL):: Minyeong Choe, Haehyun Cho, Changho Seo, and Hyunil Kim. 2025. Do All Autoregressive Transformers Remember Facts the Same Way? A Cross-Architecture Analysis of Recall Mechanisms. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 28482–28501, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Do All Autoregressive Transformers Remember Facts the Same Way? A Cross-Architecture Analysis of Recall Mechanisms (Choe et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/name-variant-enfa-fane/2025.emnlp-main.1448.pdf
Checklist:: 2025.emnlp-main.1448.checklist.pdf

PDF Cite Search Checklist Fix data