LongMP-Bench: A Benchmark for Multimodal Persona Understanding in Long-Term Dialogues

Zhuoqun Li, Zhaopei Huang, Wenxuan Wang, Qin Jin


Abstract
Understanding multimodal user personas in long-term dialogues is essential for building personalized and human-like dialogue systems. However, existing datasets suffer from limited persona diversity and static, overly simplified settings, making them insufficient for capturing the complexity of real-world interactions. To address these limitations, we introduce LongMP-Bench, a benchmark designed to evaluate the capabilities of models in understanding evolving user personas within long-term multimodal dialogues. We present a multi-step, scalable data construction pipeline that generates long-term interaction records centered around multimodal personas, followed by human refinement for quality assurance. The resulting dataset contains long conversations from 150 users, each exhibiting visual consistency and dynamic persona development over time. Built on this dataset, we define a suite of tasks to comprehensively assess models’ ability to track persona evolution, integrate visual and textual inputs, and apply persona understanding in realistic dialogue scenarios. Extensive experiments on LongMP-Bench highlight the substantial challenges in multimodal persona understanding, especially in tracking persona shifts and leveraging multimodal context effectively. We will release our benchmark and code to facilitate future research in multimodal and personalized dialogue systems.
Anthology ID:
2026.findings-acl.1159
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
23132–23160
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1159/
DOI:
Bibkey:
Cite (ACL):
Zhuoqun Li, Zhaopei Huang, Wenxuan Wang, and Qin Jin. 2026. LongMP-Bench: A Benchmark for Multimodal Persona Understanding in Long-Term Dialogues. In Findings of the Association for Computational Linguistics: ACL 2026, pages 23132–23160, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
LongMP-Bench: A Benchmark for Multimodal Persona Understanding in Long-Term Dialogues (Li et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1159.pdf
Checklist:
 2026.findings-acl.1159.checklist.pdf