Abstract
Existing passage retrieval systems typically adopt a two-stage retrieve-then-rerank pipeline. To obtain an effective reranking model, many prior works have focused on improving the model architectures, such as leveraging powerful pretrained large language models (LLM) and designing better objective functions. However, less attention has been paid to the issue of collecting high-quality training data. In this paper, we propose HYRR, a framework for training robust reranking models. Specifically, we propose a simple but effective approach to select training data using hybrid retrievers. Our experiments show that the rerankers trained with HYRR are robust to different first-stage retrievers. Moreover, evaluations using MS MARCO and BEIR data sets demonstrate our proposed framework effectively generalizes to both supervised and zero-shot retrieval settings.- Anthology ID:
- 2024.lrec-main.748
- Volume:
- Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
- Venues:
- LREC | COLING
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 8528–8534
- Language:
- URL:
- https://aclanthology.org/2024.lrec-main.748
- DOI:
- Cite (ACL):
- Jing Lu, Keith Hall, Ji Ma, and Jianmo Ni. 2024. HYRR: Hybrid Infused Reranking for Passage Retrieval. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 8528–8534, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- HYRR: Hybrid Infused Reranking for Passage Retrieval (Lu et al., LREC-COLING 2024)
- PDF:
- https://preview.aclanthology.org/naacl-24-ws-corrections/2024.lrec-main.748.pdf