ConVerse: Benchmarking Contextual Safety in Agent-to-Agent Conversations

Amr Gomaa, Ahmed Salem, Sahar Abdelnabi


Abstract
As language models evolve into autonomous agents that act and communicate on behalf of users, ensuring safety in multi-agent ecosystems becomes a central challenge. Interactions between personal assistants and external service providers expose a core tension between utility and protection: effective collaboration requires information sharing, yet every exchange creates new attack surfaces. We introduce ConVerse, a dynamic benchmark for evaluating privacy and security risks in agent–agent interactions. ConVerse spans three practical domains (travel, real estate, insurance) with 12 user personas and over 864 contextually grounded attacks (611 privacy, 253 security). Unlike prior single-agent settings, it models autonomous, multi-turn agent-to-agent conversations where malicious requests are embedded within plausible discourse. Privacy is tested through a three-tier taxonomy assessing abstraction quality, while security attacks target tool use and preference manipulation. Evaluating seven state-of-the-art models reveals persistent vulnerabilities—privacy attacks succeed in up to 88% of cases and security breaches in up to 60%—with stronger models leaking more. By unifying privacy and security within interactive multi-agent contexts, ConVerse reframes safety as an emergent property of communication.
Anthology ID:
2026.findings-eacl.170
Volume:
Findings of the Association for Computational Linguistics: EACL 2026
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3246–3268
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.170/
DOI:
Bibkey:
Cite (ACL):
Amr Gomaa, Ahmed Salem, and Sahar Abdelnabi. 2026. ConVerse: Benchmarking Contextual Safety in Agent-to-Agent Conversations. In Findings of the Association for Computational Linguistics: EACL 2026, pages 3246–3268, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
ConVerse: Benchmarking Contextual Safety in Agent-to-Agent Conversations (Gomaa et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.170.pdf
Checklist:
 2026.findings-eacl.170.checklist.pdf