Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning

Sanket Badhe, Deep Shah


Abstract
Advanced reasoning typically requires Chain-of-Thought prompting, which is accurate but incurs prohibitive latency and substantial test-time inference costs. The standard alternative, fine-tuning smaller models, often sacrifices interpretability while introducing significant resource and operational overhead. To address these limitations, we introduce Prompt-Level Distillation (PLD). We extract explicit reasoning patterns from a Teacher model and organize them into a structured list of expressive instructions for the Student model’s System Prompt. Evaluated using Gemma-3 4B, PLD improved Macro F1 scores on StereoSet (57% to 90.0%) and Contract-NLI (67% to 83%), while increasing LogiQA accuracy to 70%. Similar results on Mistral Small 3.1 demonstrate cross-architecture generalizability, enabling these compact models to match frontier performance with negligible latency overhead. These expressive instructions render the decision-making process transparent, allowing for full human verification of logic, making this approach ideal for regulated industries such as law, finance, and content moderation, as well as high-volume use cases and edge devices.
Anthology ID:
2026.acl-industry.142
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Yunyao Li, Georg Rehm, Mei Tu
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2131–2147
Language:
URL:
https://preview.aclanthology.org/ingestion-form-platform/2026.acl-industry.142/
DOI:
Bibkey:
Cite (ACL):
Sanket Badhe and Deep Shah. 2026. Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track), pages 2131–2147, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning (Badhe & Shah, ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-form-platform/2026.acl-industry.142.pdf