Ahmed Salem
2026
QSTN: A Modular Framework for Robust Questionnaire Inference with Large Language Models
Maximilian Kreutner | Jens Rupprecht | Georg Ahnert | Ahmed Salem | Markus Strohmaier
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Maximilian Kreutner | Jens Rupprecht | Georg Ahnert | Ahmed Salem | Markus Strohmaier
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 3: System Demonstrations)
We introduce QSTN, an open-source Python framework for systematically generating responses from questionnaire-style prompts to support in-silico surveys and annotation tasks with large language models (LLMs). QSTN enables robust evaluation of questionnaire presentation, prompt perturbations, and response generation methods. Our extensive evaluation (>40 million survey responses) shows that question structure and response generation methods have a significant impact on the alignment of generated survey responses with human answers. We also find that answers can be obtained for a fraction of the compute cost, by changing the presentation method. In addition, we offer a no-code user interface that allows researchers to set up robust experiments with LLMs without coding knowledge. We hope that QSTN will support the reproducibility and reliability of LLM-based research in the future.
ConVerse: Benchmarking Contextual Safety in Agent-to-Agent Conversations
Amr Gomaa | Ahmed Salem | Sahar Abdelnabi
Findings of the Association for Computational Linguistics: EACL 2026
Amr Gomaa | Ahmed Salem | Sahar Abdelnabi
Findings of the Association for Computational Linguistics: EACL 2026
As language models evolve into autonomous agents that act and communicate on behalf of users, ensuring safety in multi-agent ecosystems becomes a central challenge. Interactions between personal assistants and external service providers expose a core tension between utility and protection: effective collaboration requires information sharing, yet every exchange creates new attack surfaces. We introduce ConVerse, a dynamic benchmark for evaluating privacy and security risks in agent–agent interactions. ConVerse spans three practical domains (travel, real estate, insurance) with 12 user personas and over 864 contextually grounded attacks (611 privacy, 253 security). Unlike prior single-agent settings, it models autonomous, multi-turn agent-to-agent conversations where malicious requests are embedded within plausible discourse. Privacy is tested through a three-tier taxonomy assessing abstraction quality, while security attacks target tool use and preference manipulation. Evaluating seven state-of-the-art models reveals persistent vulnerabilities—privacy attacks succeed in up to 88% of cases and security breaches in up to 60%—with stronger models leaking more. By unifying privacy and security within interactive multi-agent contexts, ConVerse reframes safety as an emergent property of communication.
2025
Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification
Boyang Zhang | Yicong Tan | Yun Shen | Ahmed Salem | Michael Backes | Savvas Zannettou | Yang Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Boyang Zhang | Yicong Tan | Yun Shen | Ahmed Salem | Michael Backes | Savvas Zannettou | Yang Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Recently, autonomous agents built on large language models (LLMs) have experienced significant development and are being deployed in real-world applications. Through the usage of tools, these systems can perform actions in the real world. Given the agents’ practical applications and ability to execute consequential actions, such autonomous systems can cause more severe damage than a standalone LLM if compromised. While some existing research has explored harmful actions by LLM agents, our study approaches the vulnerability from a different perspective. We introduce a new type of attack that causes malfunctions by misleading the agent into executing repetitive or irrelevant actions. Our experiments reveal that these attacks can induce failure rates exceeding 80% in multiple scenarios. Through attacks on implemented and deployable agents in multi-agent scenarios, we accentuate the realistic risks associated with these vulnerabilities. To mitigate such attacks, we propose self-examination defense methods. Our findings indicate these attacks are more difficult to detect compared to previous overtly harmful attacks, highlighting the substantial risks associated with this vulnerability.
2024
Deconstructing Classifiers: Towards A Data Reconstruction Attack Against Text Classification Models
Adel Elmahdy | Ahmed Salem
Proceedings of the Fifth Workshop on Privacy in Natural Language Processing
Adel Elmahdy | Ahmed Salem
Proceedings of the Fifth Workshop on Privacy in Natural Language Processing
Natural language processing (NLP) models have become increasingly popular in real-world applications, such as text classification. However, they are vulnerable to privacy attacks, including data reconstruction attacks that aim to extract the data used to train the model. Most previous studies on data reconstruction attacks have focused on LLM, while classification models were assumed to be more secure. In this work, we propose a new targeted data reconstruction attack called the Mix And Match attack, which takes advantage of the fact that most classification models are based on LLM. The Mix And Match attack uses the base model of the target model to generate candidate tokens and then prunes them using the classification head. We extensively demonstrate the effectiveness of the attack using both random and organic canaries. This work highlights the importance of considering the privacy risks associated with data reconstruction attacks in classification models and offers insights into possible leakages.