DIESEL: A Lightweight Inference-Time Safety Enhancement for Language Models

Ben Ganon, Alon Zolfi, Omer Hofman, Inderjeet Singh, Hisashi Kojima, Yuval Elovici, Asaf Shabtai


Abstract
Large language models (LLMs) have demonstrated impressive performance across a wide range of tasks, including open-ended dialogue, driving advancements in virtual assistants and other interactive systems. However, these models often generate outputs misaligned with human values, such as ethical norms and safety constraints, resulting in potentially harmful or inappropriate responses. While several techniques have been proposed to address this problem, they typically involve computationally intensive training procedures or introduce substantial inference-time latency. In this paper, we present DIESEL, a lightweight inference-guidance technique that can be seamlessly integrated into any autoregressive LLM to semantically filter undesirable content during generation. DIESEL guides generation by reranking token candidates according to their semantic similarity to predefined negative concepts in the latent space. It can serve either as a standalone safeguard or as an auxiliary defense layer, enhancing response safety without requiring model fine-tuning or additional data. We demonstrate DIESEL’s effectiveness on state-of-the-art conversational models, including in adversarial jailbreak scenarios. Furthermore, we show that DIESEL generalizes beyond safety applications, enabling flexible and domain-specific response filtering.
Anthology ID:
2025.findings-acl.1223
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
23870–23890
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.findings-acl.1223/
DOI:
Bibkey:
Cite (ACL):
Ben Ganon, Alon Zolfi, Omer Hofman, Inderjeet Singh, Hisashi Kojima, Yuval Elovici, and Asaf Shabtai. 2025. DIESEL: A Lightweight Inference-Time Safety Enhancement for Language Models. In Findings of the Association for Computational Linguistics: ACL 2025, pages 23870–23890, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
DIESEL: A Lightweight Inference-Time Safety Enhancement for Language Models (Ganon et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.findings-acl.1223.pdf