Flipping Knowledge Distillation: Leveraging Small Models’ Expertise to Enhance LLMs in Text Matching

Mingzhe Li; Jing Xiang; Qishen Zhang; Kaiyang Wan; Xiuying Chen

Flipping Knowledge Distillation: Leveraging Small Models’ Expertise to Enhance LLMs in Text Matching

Mingzhe Li, Jing Xiang, Qishen Zhang, Kaiyang Wan, Xiuying Chen

Abstract

Knowledge distillation typically involves transferring knowledge from a Large Language Model (LLM) to a Smaller Language Model (SLM). However, in tasks like text matching, smaller fine-tuned models often produce more effective domain-specific representations as they focus on optimizing the similarity between input pairs. To combine the specialized strengths of small models with the rich semantic understanding of LLMs, we propose a flipped knowledge distillation paradigm, where the LLM learns from the SLM. To bridge the architectural gap between commonly used decoder-only LLMs and the encoder-based frameworks of smaller models, we reinterpret LLMs as encoder-decoder models using LoRA. In this setup, the encoder generates compressed text representations, while the decoder transforms them into the output space. During training, the encoder produces text representations and computes their similarities, which are then aligned with the similarity scores produced by the teacher model. We achieve this alignment using our proposed Margin-aware Contrastive Learning (MCL) approach. MCL ensures accurate similarity for both positive and negative pairs, while also adaptively handling differences within positive and negative samples. We validate the effectiveness of our approach on financial and healthcare benchmarks as well as real-world online applications. Our model has been fully deployed in an online application environment, demonstrating its practical utility.

Anthology ID:: 2025.acl-long.1081
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 22218–22229
Language:
URL:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1081/
DOI:
Bibkey:
Cite (ACL):: Mingzhe Li, Jing Xiang, Qishen Zhang, Kaiyang Wan, and Xiuying Chen. 2025. Flipping Knowledge Distillation: Leveraging Small Models’ Expertise to Enhance LLMs in Text Matching. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 22218–22229, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Flipping Knowledge Distillation: Leveraging Small Models’ Expertise to Enhance LLMs in Text Matching (Li et al., ACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1081.pdf

PDF Cite Search Fix data