From Trust to Compromise: Outcome-Verified LLM Phishing Simulation and Real-Time Defense

Tulika Tewari; Nalin Asanka Gamagedara Arachchilage; Jagat Sesh Challa; Dhruv Kumar

From Trust to Compromise: Outcome-Verified LLM Phishing Simulation and Real-Time Defense

Tulika Tewari, Nalin Asanka Gamagedara Arachchilage, Jagat Sesh Challa, Dhruv Kumar

Abstract

Large Language Models (LLMs) excel as conversational agents. However, these capabilities can be weaponized to automate social engineering attacks that gradually build rapport to compromise the online safety of users. To understand this, researchers have simulated LLM-based attacks in controlled settings. However, the existing simulators focus on just Personal Identifiable Information (PII) requests within the chat. Thus, to represent a complete attack scenario, we introduce PhishSim, an outcome-driven LLM-based phishing simulator that verifies compromise by simulating a victim completing an external action step, such as submitting credentials on a malicious platform. This enables the generation of diverse, multi-turn attack trajectories. Building on these trajectories, we position PhishGate as a practical mitigation baseline for outcome-grounded conversational phishing: a real-time multi-agent risk scorer that detects manipulation tactics and estimates the severity of ongoing chats. For ambiguous cases, it invokes RAG-supported consistency checks. Evaluating four state-of-the-art LLM backends in a real-time setting, we find that PhishGate improves dialogue-level detection over a real-time baseline. Our results highlight both the promise and brittleness of LLM-based real-time phishing defense, providing an outcome-grounded testbed for studying conversational compromise.

Anthology ID:: 2026.acl-long.543
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 11831–11845
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.543/
DOI:
Bibkey:
Cite (ACL):: Tulika Tewari, Nalin Asanka Gamagedara Arachchilage, Jagat Sesh Challa, and Dhruv Kumar. 2026. From Trust to Compromise: Outcome-Verified LLM Phishing Simulation and Real-Time Defense. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11831–11845, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: From Trust to Compromise: Outcome-Verified LLM Phishing Simulation and Real-Time Defense (Tewari et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.543.pdf
Checklist:: 2026.acl-long.543.checklist.pdf

PDF Cite Search Checklist Fix data