Transplant Then Regenerate: A New Paradigm for Text Data Augmentation

Guangzhan Wang, Hongyu Zhang, Beijun Shen, Xiaodong Gu


Abstract
Data augmentation is a critical technique in deep learning. Traditional methods like Back-translation typically focus on lexical-level rephrasing, which primarily produces variations with the same semantics. While large language models (LLMs) have enhanced text augmentation by their “knowledge emergence” capability, controlling the style and structure of these outputs remains challenging and requires meticulous prompt engineering. In this paper, we propose LMTransplant, a novel text augmentation paradigm leveraging LLMs. The core idea of LMTransplant is transplant-then-regenerate: incorporating seed text into a context expanded by LLM, and asking the LLM to regenerate a variant based on the expanded context. This strategy allows the model to create more diverse and creative content-level variants by fully leveraging the knowledge embedded in LLMs, while preserving the core attributes of the original text. We evaluate LMTransplant across various text-related tasks, demonstrating its superior performance over existing text augmentation methods. Moreover, LMTransplant demonstrates exceptional scalability as the size of augmented data grows.
Anthology ID:
2025.emnlp-main.702
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
13917–13931
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.702/
DOI:
Bibkey:
Cite (ACL):
Guangzhan Wang, Hongyu Zhang, Beijun Shen, and Xiaodong Gu. 2025. Transplant Then Regenerate: A New Paradigm for Text Data Augmentation. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 13917–13931, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Transplant Then Regenerate: A New Paradigm for Text Data Augmentation (Wang et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.702.pdf
Checklist:
 2025.emnlp-main.702.checklist.pdf