Dengwu He
2026
A Novel Matching Paradigm: Unified Generative and Discriminative LLM with Prompt Compression for Relevance Learning
Guoliang Zhao | Zixin Cui | Chao Ye | Dengwu He | Fei Huang | Yubo Liu | Shuanglong Li | Tzungren Kuo | Bin Ding | Shuang Zhang | KunhongZhu | Zhi Guo | Liu Lin
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Guoliang Zhao | Zixin Cui | Chao Ye | Dengwu He | Fei Huang | Yubo Liu | Shuanglong Li | Tzungren Kuo | Bin Ding | Shuang Zhang | KunhongZhu | Zhi Guo | Liu Lin
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
The matching paradigm is fundamental to large-scale information retrieval and is widely used in industrial search and advertising systems. Existing approaches employ Large Language Models (LLMs) primarily as feature extractors, underutilizing their full modeling capabilities. To address this limitation, we propose a novel matching paradigm, termed the Unified Generative and Discriminative LLM (UGD). It integrates two-tower, single-tower, and generative tasks within a unified LLM framework via attention-mask partitioning, enabling generative tasks to serve as auxiliary supervision for discriminative learning and facilitating distillation from single-tower to two-tower architectures through a multi-task fine-tuning mechanism. To satisfy online latency constraints, we further introduce a self-distillation variant of UGD with a KMeans-enhanced linearized RQVAE for prompt compression and quantization. This design compresses and quantizes landing-page documents during inference, improving serving efficiency and reducing storage overhead. Extensive experiments show that UGD achieves superior performance and strong practical value. The framework has been deployed in an industrial search engine serving hundreds of millions of users and hundreds of thousands of advertisers, significantly enhancing search experience. Open access upon publication.