UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers

Jon Saad-Falcon; Omar Khattab; Keshav Santhanam; Radu Florian; Martin Franz; Salim Roukos; Avirup Sil; Md Sultan; Christopher Potts

doi:10.18653/v1/2023.emnlp-main.693

UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers

Jon Saad-Falcon, Omar Khattab, Keshav Santhanam, Radu Florian, Martin Franz, Salim Roukos, Avirup Sil, Md Sultan, Christopher Potts

Abstract

Many information retrieval tasks require large labeled datasets for fine-tuning. However, such datasets are often unavailable, and their utility for real-world applications can diminish quickly due to domain shifts. To address this challenge, we develop and motivate a method for using large language models (LLMs) to generate large numbers of synthetic queries cheaply. The method begins by generating a small number of synthetic queries using an expensive LLM. After that, a much less expensive one is used to create large numbers of synthetic queries, which are used to fine-tune a family of reranker models. These rerankers are then distilled into a single efficient retriever for use in the target domain. We show that this technique boosts zero-shot accuracy in long-tail domains and achieves substantially lower latency than standard reranking methods.

Anthology ID:: 2023.emnlp-main.693
Volume:: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 11265–11279
Language:
URL:: https://preview.aclanthology.org/ingest_wac_2008/2023.emnlp-main.693/
DOI:: 10.18653/v1/2023.emnlp-main.693
Bibkey:
Cite (ACL):: Jon Saad-Falcon, Omar Khattab, Keshav Santhanam, Radu Florian, Martin Franz, Salim Roukos, Avirup Sil, Md Sultan, and Christopher Potts. 2023. UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 11265–11279, Singapore. Association for Computational Linguistics.
Cite (Informal):: UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers (Saad-Falcon et al., EMNLP 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest_wac_2008/2023.emnlp-main.693.pdf
Video:: https://preview.aclanthology.org/ingest_wac_2008/2023.emnlp-main.693.mp4

PDF Cite Search Video Fix data