Antonin Poché
2026
Interpreto: An Explainability Library for Transformers
Antonin Poché | Thomas Mullor | Gabriele Sarti | Frédéric Boisnard | Corentin Friedrich | Charlotte Claye | Francois Hoofd | Raphael Bernas | Nicholas Asher | Celine Hudelot | Fanny Jourdan
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Antonin Poché | Thomas Mullor | Gabriele Sarti | Frédéric Boisnard | Corentin Friedrich | Charlotte Claye | Francois Hoofd | Raphael Bernas | Nicholas Asher | Celine Hudelot | Fanny Jourdan
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Interpreto is an open-source Python library for interpreting HuggingFace language models, from early BERT variants to LLMs. It provides two complementary families of methods: attribution methods and concept-based explanations. The library bridges recent research and practical tooling by exposing explanation workflows through a unified API for both classification and text generation. A key differentiator is its end-to-end concept-based pipeline (from activation extraction to concept learning, interpretation, and scoring), which goes beyond feature-level attributions and is uncommon in existing libraries.
2025
ConSim: Measuring Concept-Based Explanations’ Effectiveness with Automated Simulatability
Antonin Poché | Alon Jacovi | Agustin Martin Picard | Victor Boutin | Fanny Jourdan
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Antonin Poché | Alon Jacovi | Agustin Martin Picard | Victor Boutin | Fanny Jourdan
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Concept-based explanations work by mapping complex model computations to human-understandable concepts. Evaluating such explanations is very difficult, as it includes not only the quality of the induced space of possible concepts but also how effectively the chosen concepts are communicated to users. Existing evaluation metrics often focus solely on the former, neglecting the latter.We introduce an evaluation framework for measuring concept explanations via automated simulatability: a simulator’s ability to predict the explained model’s outputs based on the provided explanations. This approach accounts for both the concept space and its interpretation in an end-to-end evaluation. Human studies for simulatability are notoriously difficult to enact, particularly at the scale of a wide, comprehensive empirical evaluation (which is the subject of this work). We propose using large language models (LLMs) as simulators to approximate the evaluation and report various analyses to make such approximations reliable. Our method allows for scalable and consistent evaluation across various models and datasets. We report a comprehensive empirical evaluation using this framework and show that LLMs provide consistent rankings of explanation methods. Code available at Anonymous GitHub.