Abstract
Pre-trained language models (PLMs) exhibit promise in retrieval tasks but struggle with out-of-domain data due to distribution shifts.Addressing this, generative domain adaptation (DA), known as GPL, tackles distribution shifts by generating pseudo queries and labels to train models for predicting query-document relationships in new domains.However, it overlooks the domain distribution, causing the model to struggle with aligning the distribution in the target domain.We, therefore, propose a Distribution-Aware Domain Adaptation (DADA) to guide the model to consider the domain distribution knowledge at the level of both a single document and the corpus, which is referred to as observation-level feedback and domain-level feedback, respectively.Our method effectively adapts the model to the target domain and expands document representation to unseen gold query terms using domain and observation feedback, as demonstrated by empirical results on the BEIR benchmark.- Anthology ID:
- 2024.findings-acl.825
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2024
- Month:
- August
- Year:
- 2024
- Address:
- Bangkok, Thailand
- Editors:
- Lun-Wei Ku, Andre Martins, Vivek Srikumar
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 13882–13893
- Language:
- URL:
- https://aclanthology.org/2024.findings-acl.825
- DOI:
- 10.18653/v1/2024.findings-acl.825
- Cite (ACL):
- Dohyeon Lee, Jongyoon Kim, Seung-won Hwang, and Joonsuk Park. 2024. DADA: Distribution-Aware Domain Adaptation of PLMs for Information Retrieval. In Findings of the Association for Computational Linguistics: ACL 2024, pages 13882–13893, Bangkok, Thailand. Association for Computational Linguistics.
- Cite (Informal):
- DADA: Distribution-Aware Domain Adaptation of PLMs for Information Retrieval (Lee et al., Findings 2024)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2024.findings-acl.825.pdf