Exploiting Instruction-Following Retrievers for Malicious Information Retrieval

Parishad BehnamGhader; Nicholas Meade; Siva Reddy

doi:10.18653/v1/2025.findings-acl.673

Exploiting Instruction-Following Retrievers for Malicious Information Retrieval

Parishad BehnamGhader, Nicholas Meade, Siva Reddy

Abstract

Instruction-following retrievers have been widely adopted alongside LLMs in real-world applications, but little work has investigated the safety risks surrounding their increasing search capabilities. We empirically study the ability of retrievers to satisfy malicious queries, both when used directly and when used in a retrieval augmented generation-based setup. Concretely, we investigate six leading retrievers, including NV-Embed and LLM2Vec, and find that given malicious requests, most retrievers can (for >50% of queries) select relevant harmful passages. For example, LLM2Vec correctly selects passages for 61.35% of our malicious queries. We further uncover an emerging risk with instruction-following retrievers, where highly relevant harmful information can be surfaced by exploiting their instruction-following capabilities. Finally, we show that even safety-aligned LLMs, such as Llama3, can satisfy malicious requests when provided with harmful retrieved passages in-context. In summary, our findings underscore the malicious misuse risks associated with increasing retriever capability.

Anthology ID:: 2025.findings-acl.673
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12962–12980
Language:
URL:: https://preview.aclanthology.org/mtsummit-25-ingestion/2025.findings-acl.673/
DOI:: 10.18653/v1/2025.findings-acl.673
Bibkey:
Cite (ACL):: Parishad BehnamGhader, Nicholas Meade, and Siva Reddy. 2025. Exploiting Instruction-Following Retrievers for Malicious Information Retrieval. In Findings of the Association for Computational Linguistics: ACL 2025, pages 12962–12980, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Exploiting Instruction-Following Retrievers for Malicious Information Retrieval (BehnamGhader et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/mtsummit-25-ingestion/2025.findings-acl.673.pdf

PDF Cite Search Fix data