CIFLEX: Contextual Instruction Flow for Sub-task Execution in Multi-Turn Interactions with a Single On-Device LLM

Juntae Lee; Jihwan Bang; Seunghan Yang; Simyung Chang

CIFLEX: Contextual Instruction Flow for Sub-task Execution in Multi-Turn Interactions with a Single On-Device LLM

Juntae Lee, Jihwan Bang, Seunghan Yang, Simyung Chang

Abstract

We present CIFLEX (Contextual Instruction FLow with EXecution), a novel execution system for efficient sub-task handling in multi-turn interactions with a single on-device large language model (LLM). As LLMs become increasingly capable, a single model is expected to handle diverse sub-tasks that more effectively and comprehensively support answering user requests. Naive approach reprocesses the entire conversation context when switching between main and sub-tasks (e.g., query rewriting, summarization), incurring significant computational overhead. CIFLEX mitigates this overhead by reusing the key-value (KV) cache from the main task and injecting only task-specific instructions into isolated side paths. After sub-task execution, the model rolls back to the main path via cached context, thereby avoiding redundant prefill computation. To support sub-task selection, we also develop a hierarchical classification strategy tailored for small-scale models, decomposing multi-choice decisions into binary ones. Experiments show that CIFLEX significantly reduces computational costs without degrading task performance, enabling scalable and efficient multi-task dialogue on-device.

Anthology ID:: 2025.emnlp-main.533
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 10556–10570
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.533/
DOI:
Bibkey:
Cite (ACL):: Juntae Lee, Jihwan Bang, Seunghan Yang, and Simyung Chang. 2025. CIFLEX: Contextual Instruction Flow for Sub-task Execution in Multi-Turn Interactions with a Single On-Device LLM. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 10556–10570, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: CIFLEX: Contextual Instruction Flow for Sub-task Execution in Multi-Turn Interactions with a Single On-Device LLM (Lee et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.533.pdf
Checklist:: 2025.emnlp-main.533.checklist.pdf

PDF Cite Search Checklist Fix data