Speed Without Sacrifice: Fine-Tuning Language Models with Medusa and Knowledge Distillation in Travel Applications

Daniel Zagyva; Emmanouil Stergiadis; Laurens Van Der Maas; Aleksandra Dokic; Eran Fainman; Ilya Gusev; Moran Beladev

Speed Without Sacrifice: Fine-Tuning Language Models with Medusa and Knowledge Distillation in Travel Applications

Daniel Zagyva, Emmanouil Stergiadis, Laurens Van Der Maas, Aleksandra Dokic, Eran Fainman, Ilya Gusev, Moran Beladev

Abstract

In high-stakes industrial NLP applications, balancing generation quality with speed and efficiency presents significant challenges. We address them by investigating two complementary optimization approaches: Medusa for speculative decoding and knowledge distillation (KD) for model compression. We demonstrate the practical application of these techniques in real-world travel domain tasks, including trip planning, smart filters, and generating accommodation descriptions. We introduce modifications to the Medusa implementation, starting with base pre-trained models rather than conversational fine-tuned ones, and developing a simplified single-stage training process for Medusa-2 that maintains performance while reducing computational requirements. Lastly, we present a novel framework that combines Medusa with knowledge distillation, achieving compounded benefits in both model size and inference speed. Our experiments with TinyLlama-1.1B as the student model and Llama-3.1-70B as the teacher show that the combined approach maintains the teacher’s performance quality while reducing inference latency by 10-20x.

Anthology ID:: 2025.acl-industry.48
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Georg Rehm, Yunyao Li
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 684–692
Language:
URL:: https://preview.aclanthology.org/landing_page/2025.acl-industry.48/
DOI:
Bibkey:
Cite (ACL):: Daniel Zagyva, Emmanouil Stergiadis, Laurens Van Der Maas, Aleksandra Dokic, Eran Fainman, Ilya Gusev, and Moran Beladev. 2025. Speed Without Sacrifice: Fine-Tuning Language Models with Medusa and Knowledge Distillation in Travel Applications. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track), pages 684–692, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Speed Without Sacrifice: Fine-Tuning Language Models with Medusa and Knowledge Distillation in Travel Applications (Zagyva et al., ACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2025.acl-industry.48.pdf

PDF Cite Search Fix data