Superfluous Instruction: Vulnerabilities Stemming from Task-Specific Superficial Expressions in Instruction Templates

Toma Suzuki, Yusuke Sakai, Justin Vasselli, Hidetaka Kamigaito, Taro Watanabe


Abstract
Large language models (LLMs) achieve high performance through instruction-tuning, which involves learning various tasks using instruction templates. However, these templates often contain task-specific expressions, which are words that frequently appear in certain contexts but do not always convey the actual meaning of that context, even if they seem closely related to the target task. Biases inherent in such instruction templates may be learned by LLMs during training, potentially degrading performance when the models encounter superficial expressions. In this study, we propose a method that incorporates additional instructions to FLAN templates, without altering the base instruction to produce “superfluous instructions”. This allows us to investigate the vulnerabilities of LLMs caused by overfitting to task-specific expressions embedded in instruction templates. The experimental results revealed that the inclusion of superficial words strongly related to each task in the instruction text can alter the output, regardless of the intended meaning.
Anthology ID:
2025.knowllm-1.12
Volume:
Proceedings of the 3rd Workshop on Towards Knowledgeable Foundation Models (KnowFM)
Month:
August
Year:
2025
Address:
Vienna, Austria
Editors:
Yuji Zhang, Canyu Chen, Sha Li, Mor Geva, Chi Han, Xiaozhi Wang, Shangbin Feng, Silin Gao, Isabelle Augenstein, Mohit Bansal, Manling Li, Heng Ji
Venues:
KnowLLM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
140–152
Language:
URL:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2025.knowllm-1.12/
DOI:
10.18653/v1/2025.knowllm-1.12
Bibkey:
Cite (ACL):
Toma Suzuki, Yusuke Sakai, Justin Vasselli, Hidetaka Kamigaito, and Taro Watanabe. 2025. Superfluous Instruction: Vulnerabilities Stemming from Task-Specific Superficial Expressions in Instruction Templates. In Proceedings of the 3rd Workshop on Towards Knowledgeable Foundation Models (KnowFM), pages 140–152, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Superfluous Instruction: Vulnerabilities Stemming from Task-Specific Superficial Expressions in Instruction Templates (Suzuki et al., KnowLLM 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2025.knowllm-1.12.pdf