Self-supervised Data Augmentation for Text Classification in Low-Data Settings

Deyu Ding; Mengying Wang; Andreas Spitz

Self-supervised Data Augmentation for Text Classification in Low-Data Settings

Abstract

Due to data sparsity and high annotation cost, data augmentation has established itself as an effective tool for boosting model performance on supervised NLP tasks. Where task-agnostic augmentation methods tend to act as simple regularizers for the data, task-aware methods also leverage labels for the generation of data that are most suitable for downstream tasks. While prior work has investigated generation and sampling strategies individually, the potential of a self-supervised approach that leverages multiple pre-trained models in generation and sampling remains underexplored. To address this issue, we present an ensemble-based framework of language models that proposes augmentation candidates and internally reviews their suitability for low-resource text classification tasks. We evaluate our model on six classification benchmarks and find that it consistently outperforms state-of-the-art data augmentation baselines in classification accuracy by an average of 0.97 points in low-data scenarios.

Anthology ID:: 2026.lrec-main.788
Volume:: Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:: May
Year:: 2026
Address:: Palma de Mallorca, Spain
Editors:: Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:: LREC
SIG:
Publisher:: ELRA Language Resource Association
Note:
Pages:: 10046–10056
Language:
URL:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.788/
DOI:
Bibkey:
Cite (ACL):: Deyu Ding, Mengying Wang, and Andreas Spitz. 2026. Self-supervised Data Augmentation for Text Classification in Low-Data Settings. International Conference on Language Resources and Evaluation, main:10046–10056.
Cite (Informal):: Self-supervised Data Augmentation for Text Classification in Low-Data Settings (Ding et al., LREC 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.788.pdf

PDF Cite Search Fix data