Abstract
In the aftermath of GPT-3.5, commonly known as ChatGPT, research have attempted to assess its capacity for lowering annotation cost, either by doing zero-shot learning, generating new data, or replacing human annotators. Some studies have also investigated its use for data augmentation (DA), but only in limited contexts, which still leaves the question of how ChatGPT performs compared to state-of-the-art algorithms. In this paper, we use ChatGPT to create new data both with paraphrasing and with zero-shot generation, and compare it to seven other algorithms. We show that while ChatGPT performs exceptionally well on some simpler data, it overall does not perform better than the other algorithms, yet demands a much larger implication from the practitioner due to the ChatGPT often refusing to answer due to sensitive content in the datasets.- Anthology ID:
- 2023.findings-emnlp.1044
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2023
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Houda Bouamor, Juan Pino, Kalika Bali
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 15606–15615
- Language:
- URL:
- https://aclanthology.org/2023.findings-emnlp.1044
- DOI:
- 10.18653/v1/2023.findings-emnlp.1044
- Cite (ACL):
- Frédéric Piedboeuf and Philippe Langlais. 2023. Is ChatGPT the ultimate Data Augmentation Algorithm?. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 15606–15615, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Is ChatGPT the ultimate Data Augmentation Algorithm? (Piedboeuf & Langlais, Findings 2023)
- PDF:
- https://preview.aclanthology.org/emnlp-22-attachments/2023.findings-emnlp.1044.pdf