Yehonatan Elisha

2025

pdf bib abs
Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon
Nurit Cohen Inger | Yehonatan Elisha | Bracha Shapira | Lior Rokach | Seffi Cohen
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Large language models (LLMs) often appear to excel on public benchmarks, but these high scores may mask an overreliance on dataset-specific surface cues rather than true language understanding. We introduce the **Chameleon Benchmark Overfit Detector (C-BOD)**, a meta-evaluation framework designed to reveal such overfitting. C-BOD systematically rephrases benchmark inputs via a parameterized transformation that preserves semantic content and labels, enabling the detection of performance degradation indicative of superficial pattern reliance.We conduct extensive experiments across two datasets, three rephrasing models, and multiple distortion levels, evaluating 32 state-of-the-art LLMs. On the MMLU benchmark, C-BOD reveals an average performance drop of 2.75% under modest rephrasings, with over 80% of models exhibiting statistically significant differences. Notably, higher-performing models and larger LLMs tend to show greater sensitivity, suggesting a deeper dependence on benchmark-specific phrasing.Due to its dataset and model-agnostic design, C-BOD can be easily integrated into evaluation pipelines and offers a promising foundation for overfitting mitigation strategies. Our findings challenge the community to look beyond leaderboard scores and prioritize resilience and generalization in LLM evaluation. Our code and benchmark datasets are availableat: https://github.com/nuritci/cbod

2024

pdf bib abs
Improving LLM Attributions with Randomized Path-Integration
Oren Barkan | Yehonatan Elisha | Yonatan Toib | Jonathan Weill | Noam Koenigstein
Findings of the Association for Computational Linguistics: EMNLP 2024

We present Randomized Path-Integration (RPI) - a path-integration method for explaining language models via randomization of the integration path over the attention information in the model. RPI employs integration on internal attention scores and their gradients along a randomized path, which is dynamically established between a baseline representation and the attention scores of the model. The inherent randomness in the integration path originates from modeling the baseline representation as a randomly drawn tensor from a Gaussian diffusion process. As a consequence, RPI generates diverse baselines, yielding a set of candidate attribution maps. This set facilitates the selection of the most effective attribution map based on the specific metric at hand. We present an extensive evaluation, encompassing 11 explanation methods and 5 language models, including the Llama2 and Mistral models. Our results demonstrate that RPI outperforms latest state-of-the-art methods across 4 datasets and 5 evaluation metrics.

pdf bib abs
LLM Explainability via Attributive Masking Learning
Oren Barkan | Yonatan Toib | Yehonatan Elisha | Jonathan Weill | Noam Koenigstein
Findings of the Association for Computational Linguistics: EMNLP 2024

In this paper, we introduce Attributive Masking Learning (AML), a method designed for explaining language model predictions by learning input masks. AML trains an attribution model to identify influential tokens in the input for a given language model’s prediction. The central concept of AML is to train an auxiliary attribution model to simultaneously 1) mask as much input data as possible while ensuring that the language model’s prediction closely aligns with its prediction on the original input, and 2) ensure a significant change in the model’s prediction when applying the inverse (complement) of the same mask to the input. This dual-masking approach further enables the optimization of the explanation w.r.t. the metric of interest. We demonstrate the effectiveness of AML on both encoder-based and decoder-based language models, showcasing its superiority over a variety of state-of-the-art explanation methods on multiple benchmarks.

Co-authors

Nurit Cohen Inger 1

Lior Rokach 1

Bracha Shapira 1

Venues

findings2
emnlp1

Fix author