Exploring Quality and Diversity in Synthetic Data Generation for Argument Mining

Jianzhu Bao, Yuqi Huang, Yang Sun, Wenya Wang, Yice Zhang, Bojun Jin, Ruifeng Xu


Abstract
The advancement of Argument Mining (AM) is hindered by a critical bottleneck: the scarcity of structure-annotated datasets, which are expensive to create manually. Inspired by recent successes in synthetic data generation across various NLP tasks, this paper explores methodologies for LLMs to generate synthetic data for AM.We investigate two complementary synthesis perspectives: a quality-oriented synthesis approach, which employs structure-aware paraphrasing to preserve annotation quality, and a diversity-oriented synthesis approach, which generates novel argumentative texts with diverse topics and argument structures.Experiments on three datasets show that augmenting original training data with our synthetic data, particularly when combining both quality- and diversity-oriented instances, significantly enhances the performance of existing AM models, both in full-data and low-resource settings.Moreover, the positive correlation between synthetic data volume and model performance highlights the scalability of our methods.
Anthology ID:
2025.emnlp-main.1351
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
26592–26615
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1351/
DOI:
Bibkey:
Cite (ACL):
Jianzhu Bao, Yuqi Huang, Yang Sun, Wenya Wang, Yice Zhang, Bojun Jin, and Ruifeng Xu. 2025. Exploring Quality and Diversity in Synthetic Data Generation for Argument Mining. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 26592–26615, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Exploring Quality and Diversity in Synthetic Data Generation for Argument Mining (Bao et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1351.pdf
Checklist:
 2025.emnlp-main.1351.checklist.pdf