Abstract
Text augmentation is an effective technique for addressing the problem of insufficient data in natural language processing. However, existing text augmentation methods tend to focus on few-shot scenarios and usually perform poorly on large public datasets. Our research indicates that existing augmentation methods often generate instances with shifted feature spaces, which leads to a drop in performance on the augmented data (for example, EDA generally loses approximately 2% in aspect-based sentiment classification). To address this problem, we propose a hybrid instance-filtering framework (BoostAug) based on pre-trained language models that can maintain a similar feature space with natural datasets. BoostAug is transferable to existing text augmentation methods (such as synonym substitution and back translation) and significantly improves the augmentation performance by 2-3% in classification accuracy. Our experimental results on three classification tasks and nine public datasets show that BoostAug addresses the performance drop problem and outperforms state-of-the-art text augmentation methods. Additionally, we release the code to help improve existing augmentation methods on large datasets.- Anthology ID:
- 2023.findings-acl.105
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2023
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Editors:
- Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1652–1669
- Language:
- URL:
- https://aclanthology.org/2023.findings-acl.105
- DOI:
- 10.18653/v1/2023.findings-acl.105
- Cite (ACL):
- Heng Yang and Ke Li. 2023. Boosting Text Augmentation via Hybrid Instance Filtering Framework. In Findings of the Association for Computational Linguistics: ACL 2023, pages 1652–1669, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- Boosting Text Augmentation via Hybrid Instance Filtering Framework (Yang & Li, Findings 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-3/2023.findings-acl.105.pdf