Abstract
Data Maps (Swayamdipta, et al. 2020) have emerged as a powerful tool for diagnosing large annotated datasets. Given a model fitted on a dataset, these maps show each data instance from the dataset in a 2-dimensional space defined by a) the model’s confidence in the true class and b) the variability of this confidence. In previous work, confidence and variability are usually computed using training dynamics, which requires the fitting of a strong model to the dataset. In this paper, we introduce a novel approach: Zero-Shot Data Maps based on fast bi-encoder networks. For each data point, confidence on the true label and variability are computed over the members of an ensemble of zero-shot models constructed with different — but semantically equivalent — label descriptions, i.e., textual representations of each class in a given label space. We conduct a comparative analysis of maps compiled using traditional training dynamics and our proposed zero-shot models across various datasets. Our findings reveal that Zero-Shot Data Maps generally match those produced by the traditional method while delivering up to a 14x speedup. The code is available [here](https://github.com/symanto-research/zeroshot-cartography).- Anthology ID:
- 2023.findings-emnlp.554
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2023
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Houda Bouamor, Juan Pino, Kalika Bali
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 8264–8277
- Language:
- URL:
- https://aclanthology.org/2023.findings-emnlp.554
- DOI:
- 10.18653/v1/2023.findings-emnlp.554
- Cite (ACL):
- Angelo Basile, Marc Franco-Salvador, and Paolo Rosso. 2023. Zero-Shot Data Maps. Efficient Dataset Cartography Without Model Training. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 8264–8277, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Zero-Shot Data Maps. Efficient Dataset Cartography Without Model Training (Basile et al., Findings 2023)
- PDF:
- https://preview.aclanthology.org/emnlp-22-attachments/2023.findings-emnlp.554.pdf