Abstract
Existing dense retrieval models for scientific documents have been optimized for either retrieval by short queries, or for document similarity, but usually not for both. In this paper, we explore the space of combining multiple objectives to achieve a single representation model that presents a good balance between both modes of dense retrieval, combining the relevance judgements from MS MARCO with the citation similarity of SPECTER, and the self-supervised objective of independent cropping. We also consider the addition of training data from document co-citation in a sentence context and domain-specific synthetic data. We show that combining multiple objectives yields models that generalize well across different benchmark tasks, improving up to 73% over models trained on a single objective.- Anthology ID:
- 2022.sdp-1.9
- Volume:
- Proceedings of the Third Workshop on Scholarly Document Processing
- Month:
- October
- Year:
- 2022
- Address:
- Gyeongju, Republic of Korea
- Venue:
- sdp
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 80–88
- Language:
- URL:
- https://aclanthology.org/2022.sdp-1.9
- DOI:
- Cite (ACL):
- Mathias Parisot and Jakub Zavrel. 2022. Multi-objective Representation Learning for Scientific Document Retrieval. In Proceedings of the Third Workshop on Scholarly Document Processing, pages 80–88, Gyeongju, Republic of Korea. Association for Computational Linguistics.
- Cite (Informal):
- Multi-objective Representation Learning for Scientific Document Retrieval (Parisot & Zavrel, sdp 2022)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2022.sdp-1.9.pdf
- Code
- zetaalphavector/multi-obj-repr-learning
- Data
- BEIR, MS MARCO, SciDocs, SciFact