Daniel Takabi


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
NEXUS: Network Exploration for eXploiting Unsafe Sequences in Multi-Turn LLM Jailbreaks
Javad Rafiei Asl | Sidhant Narula | Mohammad Ghasemigol | Eduardo Blanco | Daniel Takabi
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Large Language Models (LLMs) have revolutionized natural language processing, yet remain vulnerable to jailbreak attacks—particularly multi-turn jailbreaks that distribute malicious intent across benign exchanges, thereby bypassing alignment mechanisms. Existing approaches often suffer from limited exploration of the adversarial space, rely on hand-crafted heuristics, or lack systematic query refinement. We propose NEXUS (Network Exploration for eXploiting Unsafe Sequences), a modular framework for constructing, refining, and executing optimized multi-turn attacks. NEXUS comprises: (1) ThoughtNet, which hierarchically expands a harmful intent into a structured semantic network of topics, entities, and query chains; (2) a feedback-driven Simulator that iteratively refines and prunes these chains through attacker–victim–judge LLM collaboration using harmfulness and semantic-similarity benchmarks; and (3) a Network Traverser that adaptively navigates the refined query space for real-time attacks. This pipeline systematically uncovers stealthy, high-success adversarial paths across LLMs. Our experimental results on several closed-source and open-source LLMs show that NEXUS can achieve a higher attack success rate, between 2.1% and 19.4%, compared to state-of-the-art approaches. Our source code is available at https://github.com/inspire-lab/NEXUS.

2024

pdf bib
RobustSentEmbed: Robust Sentence Embeddings Using Adversarial Self-Supervised Contrastive Learning
Javad Rafiei Asl | Prajwal Panzade | Eduardo Blanco | Daniel Takabi | Zhipeng Cai
Findings of the Association for Computational Linguistics: NAACL 2024

Pre-trained language models (PLMs) have consistently demonstrated outstanding performance across a diverse spectrum of natural language processing tasks. Nevertheless, despite their success with unseen data, current PLM-based representations often exhibit poor robustness in adversarial settings. In this paper, we introduce RobustSentEmbed, a self-supervised sentence embedding framework designed to improve both generalization and robustness in diverse text representation tasks and against a diverse set of adversarial attacks. Through the generation of high-risk adversarial perturbations and their utilization in a novel objective function, RobustSentEmbed adeptly learns high-quality and robust sentence embeddings. Our experiments confirm the superiority of RobustSentEmbed over state-of-the-art representations. Specifically, Our framework achieves a significant reduction in the success rate of various adversarial attacks, notably reducing the BERTAttack success rate by almost half (from 75.51% to 38.81%). The framework also yields improvements of 1.59% and 0.23% in semantic textual similarity tasks and various transfer tasks, respectively.

2023

pdf bib
RobustEmbed: Robust Sentence Embeddings Using Self-Supervised Contrastive Pre-Training
Javad Asl | Eduardo Blanco | Daniel Takabi
Findings of the Association for Computational Linguistics: EMNLP 2023

Pre-trained language models (PLMs) have demonstrated their exceptional performance across a wide range of natural language processing tasks. The utilization of PLM-based sentence embeddings enables the generation of contextual representations that capture rich semantic information. However, despite their success with unseen samples, current PLM-based representations suffer from poor robustness in adversarial scenarios. In this paper, we propose RobustEmbed, a self-supervised sentence embedding framework that enhances both generalization and robustness in various text representation tasks and against diverse adversarial attacks. By generating high-risk adversarial perturbations to promote higher invariance in the embedding space and leveraging the perturbation within a novel contrastive objective approach, RobustEmbed effectively learns high-quality sentence embeddings. Our extensive experiments validate the superiority of RobustEmbed over previous state-of-the-art self-supervised representations in adversarial settings, while also showcasing relative improvements in seven semantic textual similarity (STS) tasks and six transfer tasks. Specifically, our framework achieves a significant reduction in attack success rate from 75.51% to 39.62% for the BERTAttack attack technique, along with enhancements of 1.20% and 0.40% in STS tasks and transfer tasks, respectively.