mllm-shap: A Shapley Value Explainability Platform for Text-Audio Multimodal Large Language Models

Jakub Muszyński, Paweł Pozorski, Maria Ganzha


Abstract
We present mllm-shap, an open-sourcePython platform for researchers and ML practitioners that extends Shapley value (SV)explainability from text-only large languagemodels to multimodal LLMs (MLLMs) thatjointly process text and audio. Buildingon the token-level SV framework introducedby TokenSHAP, mllm-shap addresses threechallenges absent in the text-only setting:(1) modality-aware coalition masking thathandles the coexistence of text tokens anddense audio encoder frames within a single input, (2) multi-turn conversation tracking withper-token role and modality metadata, and(3) audio token grouping via phonetic alignment that reduces the coalition space by 10–50 times. The platform ships as a pip-installablepackage implementing five SV estimation strategies – including a Complementary Contributions estimator with Neyman-optimal allocation that outperforms Monte Carlo baselines – together with an interactive web GUIfor real-time attribution visualization. Toour knowledge, mllm-shap is the first publicly available framework for complete, reproducible SV-based explainability of text-audioMLLMs. The package is MIT-licensed withfull source code on GitHub and a demonstration video included as supplementary material.
Anthology ID:
2026.acl-demo.38
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Greg Durrett, Ping Jian
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
387–396
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-demo.38/
DOI:
Bibkey:
Cite (ACL):
Jakub Muszyński, Paweł Pozorski, and Maria Ganzha. 2026. mllm-shap: A Shapley Value Explainability Platform for Text-Audio Multimodal Large Language Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 387–396, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
mllm-shap: A Shapley Value Explainability Platform for Text-Audio Multimodal Large Language Models (Muszyński et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-demo.38.pdf