StructFlowBench: A Structured Flow Benchmark for Multi-turn Instruction Following

Jinnan Li, Jinzhe Li, Yue Wang, Yi Chang, Yuan Wu


Abstract
Multi-turn instruction following capability constitutes a core competency of large language models (LLMs) in real-world applications. Existing evaluation benchmarks predominantly focus on fine-grained constraint satisfaction and domain-specific capability assessment, yet overlook the crucial structural dependencies between dialogue turns that distinguish multi-turn from single-turn interactions. These structural dependencies not only reflect user intent but also establish an essential second dimension for the instruction following evaluation beyond constraint satisfaction. To address this gap, we propose StructFlowBench, a multi-turn instruction following benchmark with structural flow modeling. The benchmark defines an innovative structural flow framework with six fundamental inter-turn relationships. These relationships introduce novel structural constraints for model evaluation and also serve as generation parameters for creating customized dialogue flows tailored to specific scenarios. Adopting established LLM-based automatic evaluation methodologies, we conduct systematic evaluations of 13 leading open-source and closed-source LLMs. Experimental results reveal significant deficiencies in current models’ comprehension of multi-turn dialogue structures. The code is available at https://github.com/MLGroupJLU/StructFlowBench.
Anthology ID:
2025.findings-acl.486
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9322–9341
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.findings-acl.486/
DOI:
Bibkey:
Cite (ACL):
Jinnan Li, Jinzhe Li, Yue Wang, Yi Chang, and Yuan Wu. 2025. StructFlowBench: A Structured Flow Benchmark for Multi-turn Instruction Following. In Findings of the Association for Computational Linguistics: ACL 2025, pages 9322–9341, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
StructFlowBench: A Structured Flow Benchmark for Multi-turn Instruction Following (Li et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.findings-acl.486.pdf