Unveiling Multimodal Processing: Exploring Activation Patterns in Multimodal LLMs for Interpretability and Efficiency

Chuan Wu; Meng Su; Youxuan Fang; Shaolin Zhu

doi:10.18653/v1/2025.findings-emnlp.478

Unveiling Multimodal Processing: Exploring Activation Patterns in Multimodal LLMs for Interpretability and Efficiency

Chuan Wu, Meng Su, Youxuan Fang, Shaolin Zhu

Abstract

Recent Multimodal Large Language Models (MLLMs) have achieved remarkable advancements, yet their internal mechanisms for concurrently processing diverse modalities like text, image, and audio remain largely opaque. In this paper, we propose a methodology to convert dense MLLMs into fine-grained Mixture-of-Experts (MoE) architectures. This allows us to visually investigate their multimodal activation patterns through expert activation frequency heatmaps. Conducting comprehensive experiments on representative MLLMs, we analyze the similarities and differences in internal neuron activations when handling distinct modalities. Specifically, we examine the distribution of high-frequency activated experts, the distinct roles of high-frequency (e.g., fundamental logic) and low-frequency (e.g., domain-specific concepts) multimodal shared experts, and the prevalence and localization of modality-specific experts. Furthermore, we explore leveraging these discovered activation discrepancies to guide sparse activation and model pruning. Experimental results demonstrate that our approach substantially outperforms random expert pruning and can achieve comparable or even superior performance to the original unpruned models while utilizing significantly fewer active parameters. Our work not only sheds light on the multimodal processing mechanisms within MLLMs but also provides a practical pathway toward developing more interpretable and efficient multimodal systems.

Anthology ID:: 2025.findings-emnlp.478
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9005–9016
Language:
URL:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.478/
DOI:: 10.18653/v1/2025.findings-emnlp.478
Bibkey:
Cite (ACL):: Chuan Wu, Meng Su, Youxuan Fang, and Shaolin Zhu. 2025. Unveiling Multimodal Processing: Exploring Activation Patterns in Multimodal LLMs for Interpretability and Efficiency. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 9005–9016, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Unveiling Multimodal Processing: Exploring Activation Patterns in Multimodal LLMs for Interpretability and Efficiency (Wu et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.478.pdf
Checklist:: 2025.findings-emnlp.478.checklist.pdf

PDF Cite Search Checklist Fix data