DiffusionRet: Diffusion-Enhanced Generative Retriever using Constrained Decoding

Shanbao Qiao, Xuebing Liu, Seung-Hoon Na


Abstract
Generative retrieval, which maps from a query to its relevant document identifiers (docids), has recently emerged as a new information retrieval (IR) paradigm, however, having suffered from 1) the lack of the intermediate reasoning step, caused by the manner of merely using a query to perform the hierarchical classification, and 2) the pretrain-finetune discrepancy, which comes from the use of the artificial symbols of docids. To address these limitations, we propose the novel approach of using the document generation from a query as an intermediate step before the retrieval, thus presenting  ̲diffusion-enhanced generative  ̲retrieval (DiffusionRet), which consists of two processing steps: 1) the diffusion-based document generation, which employs the sequence-to-sequence diffusion model to produce a pseudo document sample from a query, being expected to semantically close to a relevant document; 2) N-gram-based generative retrieval, which use another sequence-to-sequence model to generate n-grams that appear in the collection index for linking a generated sample to an original document. Experiment results on MS MARCO and Natural Questions dataset show that the proposed DiffusionRet significantly outperforms all the existing generative retrieval methods and leads to the state-of-the-art performances, even with much smaller number of parameters.
Anthology ID:
2023.findings-emnlp.638
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9515–9529
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.638
DOI:
10.18653/v1/2023.findings-emnlp.638
Bibkey:
Cite (ACL):
Shanbao Qiao, Xuebing Liu, and Seung-Hoon Na. 2023. DiffusionRet: Diffusion-Enhanced Generative Retriever using Constrained Decoding. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 9515–9529, Singapore. Association for Computational Linguistics.
Cite (Informal):
DiffusionRet: Diffusion-Enhanced Generative Retriever using Constrained Decoding (Qiao et al., Findings 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2023.findings-emnlp.638.pdf