@inproceedings{bonnier-2025-error,
    title = "Error Detection for Multimodal Classification",
    author = "Bonnier, Thomas",
    editor = "Cao, Trista  and
      Das, Anubrata  and
      Kumarage, Tharindu  and
      Wan, Yixin  and
      Krishna, Satyapriya  and
      Mehrabi, Ninareh  and
      Dhamala, Jwala  and
      Ramakrishna, Anil  and
      Galystan, Aram  and
      Kumar, Anoop  and
      Gupta, Rahul  and
      Chang, Kai-Wei",
    booktitle = "Proceedings of the 5th Workshop on Trustworthy NLP (TrustNLP 2025)",
    month = may,
    year = "2025",
    address = "Albuquerque, New Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2025.trustnlp-main.6/",
    doi = "10.18653/v1/2025.trustnlp-main.6",
    pages = "66--81",
    ISBN = "979-8-89176-233-6",
    abstract = "Machine learning models have proven to be useful in various key applications such as autonomous driving or diagnosis prediction. When a model is implemented under real-world conditions, it is thus essential to detect potential errors with a trustworthy approach. This monitoring practice will render decision-making safer by avoiding catastrophic failures. In this paper, the focus is on multimodal classification. We introduce a method that addresses error detection based on unlabeled data. It leverages fused representations and computes the probability that a model will fail based on detected fault patterns in validation data. To improve transparency, we employ a sampling-based approximation of Shapley values in multimodal settings in order to explain why a prediction is assessed as erroneous in terms of feature values. Further, as explanation methods can sometimes disagree, we suggest evaluating the consistency of explanations produced by different value functions and algorithms. To show the relevance of our method, we measure it against a selection of 9 baselines from various domains on tabular-text and text-image datasets, and 2 multimodal fusion strategies for the classification models. Lastly, we show the usefulness of our explanation algorithm on misclassified samples."
}Markdown (Informal)
[Error Detection for Multimodal Classification](https://preview.aclanthology.org/ingest-emnlp/2025.trustnlp-main.6/) (Bonnier, TrustNLP 2025)
ACL