Carlo Bretti
2026
Are Multimodal LLMs Movie Buffs?
Carlo Bretti | Pascal Mettes | Nanne Van Noord
Findings of the Association for Computational Linguistics: EACL 2026
Carlo Bretti | Pascal Mettes | Nanne Van Noord
Findings of the Association for Computational Linguistics: EACL 2026
No. While Multimodal Large Language Models (MLLMs) have been shown to perform very well on general video data, we systematically show that their performance on movies lags behind. This is surprising as MLLMs are increasingly used for movie understanding. To measure the performance of MLLMs on movies, we explore three pillars of movie mastery: movie knowledge, cinematographic knowledge, and critical analysis. Through a combination of quantitative and in-depth qualitative evaluations, we identify where MLLMs show promise and, in particular, where they fail. Our findings show that in small-scale settings involving factual knowledge, MLLMs are able to outperform existing methods. However, once cinematographic and critical analysis is required, MLLMs are insufficiently able to extract meaningful information from the visual modality to be able to provide useful insights. The data and project page are available at https://carlobretti.github.io/moviebuff.