Marcin Detyniecki


2025

pdf bib
SAKE: Steering Activations for Knowledge Editing
Marco Scialanga | Thibault Laugel | Vincent Grari | Marcin Detyniecki
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

As Large Langue Models have been shown to memorize real-world facts, the need to update this knowledge in a controlled and efficient manner arises. Designed with these constraints in mind, Knowledge Editing (KE) approaches propose to alter specific facts in pretrained models. However, they have been shown to suffer from several limitations, including their lack of contextual robustness and their failure to generalize to logical implications related to the fact. To overcome these issues, we propose SAKE, a steering activation method that models a fact to be edited as a distribution rather than a single prompt. Leveraging Optimal Transport, SAKE alters the LLM behavior over a whole fact-related distribution, defined as paraphrases and logical implications. Several numerical experiments demonstrate the effectiveness of this method: SAKE is thus able to perform more robust edits than its existing counterparts.

pdf bib
Metric assessment protocol in the context of answer fluctuation on MCQ tasks
Ekaterina Goliakova | Xavier Renard | Marie-Jeanne Lesot | Thibault Laugel | Christophe Marsala | Marcin Detyniecki
Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²)

Using multiple-choice questions (MCQs) has become a standard for assessing LLM capabilities efficiently. A variety of metrics can be employed for this task. However, previous research has not conducted a thorough assessment of them. At the same time, MCQ evaluation suffers from answer fluctuation: models produce different results given slight changes in prompts. We suggest a metric assessment protocol in which evaluation methodologies are analyzed through their connection with fluctuation rates, as well as original performance. Our results show that there is a strong link between existing metrics and the answer changing, even when computed without any additional prompt variants. Highest association on the protocol is demonstrated by a novel metric, worst accuracy.