It’s All Relative! – A Synthetic Query Generation Approach for Improving Zero-Shot Relevance Prediction

Aditi Chaudhary; Karthik Raman; Michael Bendersky

It’s All Relative! – A Synthetic Query Generation Approach for Improving Zero-Shot Relevance Prediction

Aditi Chaudhary, Karthik Raman, Michael Bendersky

Abstract

Large language models (LLMs) have shown promising ability to generate synthetic query-document pairs by prompting with as few as 8 demonstrations. This has enabled building better IR models, especially for tasks with no training data. Typically, such synthetic query generation (QGen) approaches condition on an input context (e.g. a text document) and generate a query relevant to that context, or condition the QGen additionally on the relevance label (e.g. relevant vs irrelevant) to generate queries across relevance buckets. However, we find that such QGen approaches are sub-optimal as they require the model to reason about the desired label and the input from a handful of examples. In this work, we propose to reduce this burden of LLMs by generating queries simultaneously for different labels. We hypothesize that instead of asking the model to generate, say, an irrelevant query given an input context, asking the model to generate an irrelevant query relative to a relevant query is a much simpler task. Extensive experimentation across nine IR datasets shows that synthetic queries generated in such a fashion translates to better downstream performance.

Anthology ID:: 2024.findings-naacl.107
Volume:: Findings of the Association for Computational Linguistics: NAACL 2024
Month:: June
Year:: 2024
Address:: Mexico City, Mexico
Editors:: Kevin Duh, Helena Gomez, Steven Bethard
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1645–1664
Language:
URL:: https://aclanthology.org/2024.findings-naacl.107
DOI:
Bibkey:
Cite (ACL):: Aditi Chaudhary, Karthik Raman, and Michael Bendersky. 2024. It’s All Relative! – A Synthetic Query Generation Approach for Improving Zero-Shot Relevance Prediction. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 1645–1664, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: It’s All Relative! – A Synthetic Query Generation Approach for Improving Zero-Shot Relevance Prediction (Chaudhary et al., Findings 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/naacl24-info/2024.findings-naacl.107.pdf