Abstract
Despite its proven efficiency in other fields, data augmentation is less popular in the context of natural language processing (NLP) due to its complexity and limited results. A recent study (Longpre et al., 2020) showed for example that task-agnostic data augmentations fail to consistently boost the performance of pretrained transformers even in low data regimes. In this paper, we investigate whether data-driven augmentation scheduling and the integration of a wider set of transformations can lead to improved performance where fixed and limited policies were unsuccessful. Our results suggest that, while this approach can help the training process in some settings, the improvements are unsubstantial. This negative result is meant to help researchers better understand the limitations of data augmentation for NLP.- Anthology ID:
- 2021.insights-1.14
- Volume:
- Proceedings of the Second Workshop on Insights from Negative Results in NLP
- Month:
- November
- Year:
- 2021
- Address:
- Online and Punta Cana, Dominican Republic
- Editors:
- João Sedoc, Anna Rogers, Anna Rumshisky, Shabnam Tafreshi
- Venue:
- insights
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 89–102
- Language:
- URL:
- https://aclanthology.org/2021.insights-1.14
- DOI:
- 10.18653/v1/2021.insights-1.14
- Cite (ACL):
- Daphné Chopard, Matthias S. Treder, and Irena Spasić. 2021. Learning Data Augmentation Schedules for Natural Language Processing. In Proceedings of the Second Workshop on Insights from Negative Results in NLP, pages 89–102, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- Learning Data Augmentation Schedules for Natural Language Processing (Chopard et al., insights 2021)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/2021.insights-1.14.pdf
- Code
- chopardda/ldas-nlp
- Data
- MultiNLI, SST, SST-2