FESTA: Functionally Equivalent Sampling for Trust Assessment of Multimodal LLMs

Debarpan Bhattacharya, Apoorva Kulkarni, Sriram Ganapathy


Abstract
The accurate trust assessment of multimodal large language models (MLLMs) generated predictions, which can enable selective prediction and improve user confidence, is challenging due to the diverse multi-modal input paradigms. We propose Functionally Equivalent Sampling for Trust Assessment (FESTA), a multimodal input sampling technique for MLLMs, that generates an uncertainty measure based on the equivalent and complementary input samplings. The proposed task-preserving sampling approach for uncertainty quantification expands the input space to probe the consistency (through equivalent samples) and sensitivity (through complementary samples) of the model. FESTA uses only input-output access of the model (black-box), and does not require ground truth (unsupervised). The experiments are conducted with various off-the-shelf multi-modal LLMs, on both visual and audio reasoning tasks. The proposed FESTA uncertainty estimate achieves significant improvement (33.3% relative improvement for vision-LLMs and 29.6% relative improvement for audio-LLMs) in selective prediction performance, based on area-under-receiver-operating-characteristic curve (AUROC) metric in detecting mispredictions. The code implementation is open-sourced.
Anthology ID:
2025.findings-emnlp.657
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12277–12295
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.657/
DOI:
10.18653/v1/2025.findings-emnlp.657
Bibkey:
Cite (ACL):
Debarpan Bhattacharya, Apoorva Kulkarni, and Sriram Ganapathy. 2025. FESTA: Functionally Equivalent Sampling for Trust Assessment of Multimodal LLMs. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 12277–12295, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
FESTA: Functionally Equivalent Sampling for Trust Assessment of Multimodal LLMs (Bhattacharya et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.657.pdf
Checklist:
 2025.findings-emnlp.657.checklist.pdf