Want To Reduce Labeling Cost? GPT-3 Can Help
Shuohang Wang, Yang Liu, Yichong Xu, Chenguang Zhu, Michael Zeng
Abstract
Data annotation is a time-consuming and labor-intensive process for many NLP tasks. Although there exist various methods to produce pseudo data labels, they are often task-specific and require a decent amount of labeled data to start with. Recently, the immense language model GPT-3 with 170 billion parameters has achieved tremendous improvement across many few-shot learning tasks. In this paper, we explore ways to leverage GPT-3 as a low-cost data labeler to train other models. We find that to make the downstream model achieve the same performance on a variety of NLU and NLG tasks, it costs 50% to 96% less to use labels from GPT-3 than using labels from humans. Furthermore, we propose a novel framework of combining pseudo labels from GPT-3 with human labels, which leads to even better performance. These results present a cost-effective data labeling methodology that is generalizable to many practical applications.- Anthology ID:
- 2021.findings-emnlp.354
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2021
- Month:
- November
- Year:
- 2021
- Address:
- Punta Cana, Dominican Republic
- Venue:
- Findings
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4195–4205
- Language:
- URL:
- https://aclanthology.org/2021.findings-emnlp.354
- DOI:
- 10.18653/v1/2021.findings-emnlp.354
- Cite (ACL):
- Shuohang Wang, Yang Liu, Yichong Xu, Chenguang Zhu, and Michael Zeng. 2021. Want To Reduce Labeling Cost? GPT-3 Can Help. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4195–4205, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- Want To Reduce Labeling Cost? GPT-3 Can Help (Wang et al., Findings 2021)
- PDF:
- https://preview.aclanthology.org/nodalida-main-page/2021.findings-emnlp.354.pdf
- Data
- AG News, SQuAD, SST