Eliciting Textual Descriptions from Representations of Continuous Prompts

Daniela Gottesman, Mor Geva, Dana Ramati


Abstract
Continuous prompts, or “soft prompts”, are a widely-adopted parameter-efficient tuning strategy for large language models, but are often less favorable due to their opaque nature. Prior attempts to interpret continuous prompts relied on projecting individual prompt tokens onto the vocabulary space. However, this approach is problematic as performant prompts can yield arbitrary or contradictory text, and it individually interprets each prompt token. In this work, we propose a new approach to interpret continuous prompts that elicits textual descriptions from their representations during model inference. Using a Patchscopes variant (Ghandeharioun et al., 2024) called InSPEcT over various tasks, we show our method often yields accurate task descriptions which become more faithful as task performance increases. Moreover, an elaborated version of InSPEcT reveals biased features in continuous prompts, whose presence correlates with biased model predictions. Providing an effective interpretability solution, InSPEcT can be leveraged to debug unwanted properties in continuous prompts and inform developers on ways to mitigate them.
Anthology ID:
2025.findings-acl.849
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
16545–16562
Language:
URL:
https://preview.aclanthology.org/display_plenaries/2025.findings-acl.849/
DOI:
Bibkey:
Cite (ACL):
Daniela Gottesman, Mor Geva, and Dana Ramati. 2025. Eliciting Textual Descriptions from Representations of Continuous Prompts. In Findings of the Association for Computational Linguistics: ACL 2025, pages 16545–16562, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Eliciting Textual Descriptions from Representations of Continuous Prompts (Gottesman et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/display_plenaries/2025.findings-acl.849.pdf