Data Augmentation using LLMs: Data Perspectives, Learning Paradigms and Challenges

Bosheng Ding; Chengwei Qin; Ruochen Zhao; Tianze Luo; Xinze Li; Guizhen Chen; Wenhan Xia; Junjie Hu; Luu Anh Tuan; Shafiq Joty

doi:10.18653/v1/2024.findings-acl.97

Data Augmentation using LLMs: Data Perspectives, Learning Paradigms and Challenges

Bosheng Ding, Chengwei Qin, Ruochen Zhao, Tianze Luo, Xinze Li, Guizhen Chen, Wenhan Xia, Junjie Hu, Anh Tuan Luu, Shafiq Joty

Abstract

In the rapidly evolving field of large language models (LLMs), data augmentation (DA) has emerged as a pivotal technique for enhancing model performance by diversifying training examples without the need for additional data collection. This survey explores the transformative impact of LLMs on DA, particularly addressing the unique challenges and opportunities they present in the context of natural language processing (NLP) and beyond. From both data and learning perspectives, we examine various strategies that utilize LLMs for data augmentation, including a novel exploration of learning paradigms where LLM-generated data is used for diverse forms of further training. Additionally, this paper highlights the primary open challenges faced in this domain, ranging from controllable data augmentation to multi-modal data augmentation. This survey highlights a paradigm shift introduced by LLMs in DA, and aims to serve as a comprehensive guide for researchers and practitioners.

Anthology ID:: 2024.findings-acl.97
Volume:: Findings of the Association for Computational Linguistics ACL 2024
Month:: August
Year:: 2024
Address:: Bangkok, Thailand and virtual meeting
Editors:: Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1679–1705
Language:
URL:: https://aclanthology.org/2024.findings-acl.97
DOI:: 10.18653/v1/2024.findings-acl.97
Bibkey:
Cite (ACL):: Bosheng Ding, Chengwei Qin, Ruochen Zhao, Tianze Luo, Xinze Li, Guizhen Chen, Wenhan Xia, Junjie Hu, Anh Tuan Luu, and Shafiq Joty. 2024. Data Augmentation using LLMs: Data Perspectives, Learning Paradigms and Challenges. In Findings of the Association for Computational Linguistics ACL 2024, pages 1679–1705, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
Cite (Informal):: Data Augmentation using LLMs: Data Perspectives, Learning Paradigms and Challenges (Ding et al., Findings 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-5/2024.findings-acl.97.pdf

PDF Search