A Little Bit Is Worse Than None: Ranking with Limited Training Data

Xinyu Zhang; Andrew Yates; Jimmy Lin

doi:10.18653/v1/2020.sustainlp-1.14

A Little Bit Is Worse Than None: Ranking with Limited Training Data

Abstract

Researchers have proposed simple yet effective techniques for the retrieval problem based on using BERT as a relevance classifier to rerank initial candidates from keyword search. In this work, we tackle the challenge of fine-tuning these models for specific domains in a data and computationally efficient manner. Typically, researchers fine-tune models using corpus-specific labeled data from sources such as TREC. We first answer the question: How much data of this type do we need? Recognizing that the most computationally efficient training is no training, we explore zero-shot ranking using BERT models that have already been fine-tuned with the large MS MARCO passage retrieval dataset. We arrive at the surprising and novel finding that “some” labeled in-domain data can be worse than none at all.

Anthology ID:: 2020.sustainlp-1.14
Volume:: Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing
Month:: November
Year:: 2020
Address:: Online
Editors:: Nafise Sadat Moosavi, Angela Fan, Vered Shwartz, Goran Glavaš, Shafiq Joty, Alex Wang, Thomas Wolf
Venue:: sustainlp
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 107–112
Language:
URL:: https://aclanthology.org/2020.sustainlp-1.14
DOI:: 10.18653/v1/2020.sustainlp-1.14
Bibkey:
Cite (ACL):: Xinyu Zhang, Andrew Yates, and Jimmy Lin. 2020. A Little Bit Is Worse Than None: Ranking with Limited Training Data. In Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing, pages 107–112, Online. Association for Computational Linguistics.
Cite (Informal):: A Little Bit Is Worse Than None: Ranking with Limited Training Data (Zhang et al., sustainlp 2020)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-bitext-workshop/2020.sustainlp-1.14.pdf
Video:: https://slideslive.com/38939436
Data: MS MARCO

PDF Search Video