InstructSum: A Benchmark to Evaluate Instruction-Following Capability of Large Language Models in Summarization

Kosuke Nishida; Kyosuke Nishida; Itsumi Saito

InstructSum: A Benchmark to Evaluate Instruction-Following Capability of Large Language Models in Summarization

Kosuke Nishida, Kyosuke Nishida, Itsumi Saito

Abstract

Pre-trained large language models (LLMs) align their outputs with user intent through natural language instructions. In the summarization task, conciseness of the output is inherently required, which makes the instruction-following capability of LLMs particularly important. That is, providing supplementary information beyond the instruction can be undesirable. In this study, we introduce a novel benchmark, InstructSum, consisting of 3,309 types of instructions to evaluate the instruction-following capability in the summarization task. InstructSum has multiple instructions per source text, and thus it enables the evaluation of how LLMs adjust the content of the summary according to the instructions. Our experiments with six LLM families revealed the challenges that LLMs face in this task. For example, LLMs provide polite and helpful responses with irrelevant information; they go beyond instructions and fail to respond with a concise summary.

Anthology ID:: 2026.lrec-main.779
Volume:: Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:: May
Year:: 2026
Address:: Palma de Mallorca, Spain
Editors:: Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:: LREC
SIG:
Publisher:: ELRA Language Resource Association
Note:
Pages:: 9940–9952
Language:
URL:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.779/
DOI:
Bibkey:
Cite (ACL):: Kosuke Nishida, Kyosuke Nishida, and Itsumi Saito. 2026. InstructSum: A Benchmark to Evaluate Instruction-Following Capability of Large Language Models in Summarization. International Conference on Language Resources and Evaluation, main:9940–9952.
Cite (Informal):: InstructSum: A Benchmark to Evaluate Instruction-Following Capability of Large Language Models in Summarization (Nishida et al., LREC 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.779.pdf

PDF Cite Search Fix data