Same Question, Different Words: A Latent Adversarial Framework for Prompt Robustness

Tingchen Fu; Fazl Barez

doi:10.18653/v1/2025.emnlp-main.1595

Same Question, Different Words: A Latent Adversarial Framework for Prompt Robustness

Abstract

Insensitivity to semantically-preserving variations of prompts (paraphrases) is crucial for reliable behavior and real-world deployment of large language models. However, language models exhibit significant performance degradation with semantically equivalent but differently phrased prompts, and existing solutions either depend on trial-and-error prompt engineering or require computationally expensive inference-time algorithms. In this study, built on the key insight that worst-case prompts exhibit a drift in embedding space, we present Latent Adversarial Paraphrasing (LAP), a dual-loop adversarial framework that optimizes a trainable perturbation as “latent continuous paraphrase” and language model performance on these perturbations iteratively. Extensive experiments are conducted to demonstrate the effectiveness of LAP across multiple backbones on the RobustAlpaca benchmark with a 0.5%-4% absolution improvement on worst-case win-rate.

Anthology ID:: 2025.emnlp-main.1595
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 31293–31307
Language:
URL:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.emnlp-main.1595/
DOI:: 10.18653/v1/2025.emnlp-main.1595
Bibkey:
Cite (ACL):: Tingchen Fu and Fazl Barez. 2025. Same Question, Different Words: A Latent Adversarial Framework for Prompt Robustness. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 31293–31307, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Same Question, Different Words: A Latent Adversarial Framework for Prompt Robustness (Fu & Barez, EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.emnlp-main.1595.pdf
Checklist:: 2025.emnlp-main.1595.checklist.pdf

PDF Cite Search Checklist Fix data