Revealing the Inherent Instructability of Pre-Trained Language Models

Seokhyun An; Minji Kim; Hyounghun Kim

doi:10.18653/v1/2025.findings-emnlp.285

Revealing the Inherent Instructability of Pre-Trained Language Models

Abstract

Instruction tuning—supervised fine-tuning using instruction-response pairs—is a key step in making pre-trained large language models (LLMs) instructable. Meanwhile, LLMs perform multitask learning during their pre-training, acquiring extensive knowledge and capabilities. We hypothesize that the pre-training stage can enable them to develop the ability to comprehend and address instructions. To verify this, we propose Response Tuning (RT), which removes the instruction and its corresponding mapping to the response from instruction tuning. Instead, it focuses solely on establishing a response distribution. Our experiments demonstrate that RT models, trained only on responses, can effectively respond to a wide range of instructions akin to their instruction-tuned counterparts. In addition, we observe that the models can recognize and reject unsafe queries after learning a safety policy only from the response data. Furthermore, we find that these observations extend to an in-context learning setting. These findings support our hypothesis, highlighting the extensive inherent capabilities of pre-trained LLMs.

Anthology ID:: 2025.findings-emnlp.285
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5305–5336
Language:
URL:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.285/
DOI:: 10.18653/v1/2025.findings-emnlp.285
Bibkey:
Cite (ACL):: Seokhyun An, Minji Kim, and Hyounghun Kim. 2025. Revealing the Inherent Instructability of Pre-Trained Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 5305–5336, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Revealing the Inherent Instructability of Pre-Trained Language Models (An et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.285.pdf
Checklist:: 2025.findings-emnlp.285.checklist.pdf

PDF Cite Search Checklist Fix data