Abstract
Many pairwise classification tasks, such as paraphrase detection and open-domain question answering, naturally have extreme label imbalance (e.g., 99.99% of examples are negatives). In contrast, many recent datasets heuristically choose examples to ensure label balance. We show that these heuristics lead to trained models that generalize poorly: State-of-the art models trained on QQP and WikiQA each have only 2.4% average precision when evaluated on realistically imbalanced test data. We instead collect training data with active learning, using a BERT-based embedding model to efficiently retrieve uncertain points from a very large pool of unlabeled utterance pairs. By creating balanced training data with more informative negative examples, active learning greatly improves average precision to 32.5% on QQP and 20.1% on WikiQA.- Anthology ID:
- 2020.findings-emnlp.305
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2020
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3400–3413
- Language:
- URL:
- https://aclanthology.org/2020.findings-emnlp.305
- DOI:
- 10.18653/v1/2020.findings-emnlp.305
- Cite (ACL):
- Stephen Mussmann, Robin Jia, and Percy Liang. 2020. On the Importance of Adaptive Data Collection for Extremely Imbalanced Pairwise Tasks. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3400–3413, Online. Association for Computational Linguistics.
- Cite (Informal):
- On the Importance of Adaptive Data Collection for Extremely Imbalanced Pairwise Tasks (Mussmann et al., Findings 2020)
- PDF:
- https://preview.aclanthology.org/remove-xml-comments/2020.findings-emnlp.305.pdf
- Code
- worksheets/0x39ba5559
- Data
- GLUE, WikiQA