Find the Intention of Instruction: Comprehensive Evaluation of Instruction Understanding for Large Language Models

Hyeonseok Moon; Jaehyung Seo; Seungyoon Lee; Chanjun Park; Heui-Seok Lim

Find the Intention of Instruction: Comprehensive Evaluation of Instruction Understanding for Large Language Models

Hyeonseok Moon, Jaehyung Seo, Seungyoon Lee, Chanjun Park, Heuiseok Lim

Abstract

Through numerous endeavors, large language models (LLMs) have witnessed significant advancements in their instruction-following capability. However, we discern that LLMs are prone to generate responses to instruction-formatted statements in an instinctive manner, rather than comprehending the underlying user intention reside within the given instructions. We also recognize that the significance of instruction understanding capability is largely overlooked in most of LLM evaluation benchmarks. To ensure more comprehensive evaluation on the instruction understanding capability of LLM, we propose Intention of Instruction (IntInst) benchmark, which primary objective is to distinguish the appropriate instruction that accurately instruct to generate a given context. IntInst presents four instruction candidates and requires LLMs to select one among them. Through extensive experiments with several instruction-tuned LLMs, we reveal that most LLMs struggle to grasp the actual intention concealed in the instruction and thoroughly analyze the factors influencing instruction understanding.

Anthology ID:: 2025.findings-naacl.330
Volume:: Findings of the Association for Computational Linguistics: NAACL 2025
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5944–5964
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.330/
DOI:
Bibkey:
Cite (ACL):: Hyeonseok Moon, Jaehyung Seo, Seungyoon Lee, Chanjun Park, and Heuiseok Lim. 2025. Find the Intention of Instruction: Comprehensive Evaluation of Instruction Understanding for Large Language Models. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 5944–5964, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Find the Intention of Instruction: Comprehensive Evaluation of Instruction Understanding for Large Language Models (Moon et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.330.pdf

PDF Cite Search Fix data