Abstract
Data augmentation with mixup has shown to be effective on the NLP tasks. Although its great success, the mixup still has shortcomings. First, vanilla mixup randomly selects one sample to generate the mixup sample for a given sample. It remains unclear how to best choose the input samples for the mixup. Second, linear interpolation limits the space of synthetic data and its regularization effect. In this paper, we propose the dynamic nonlinear mixup with distance-based sample selection, which not only generates multiple sample pairs based on the distance between each sample but also enlarges the space of synthetic samples. Specifically, we compute the distance between each input data by cosine similarity and select multiple samples for a given sample. Then we use the dynamic nonlinear mixup to fuse sample pairs. It does not use a linear, scalar mixing strategy, but a nonlinear interpolation strategy, where the mixing strategy is adaptively updated for the input and label pairs. Experiments on the multiple public datasets demonstrate that dynamic nonlinear mixup outperforms state-of-the-art methods.- Anthology ID:
- 2022.coling-1.333
- Volume:
- Proceedings of the 29th International Conference on Computational Linguistics
- Month:
- October
- Year:
- 2022
- Address:
- Gyeongju, Republic of Korea
- Editors:
- Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
- Venue:
- COLING
- SIG:
- Publisher:
- International Committee on Computational Linguistics
- Note:
- Pages:
- 3788–3797
- Language:
- URL:
- https://aclanthology.org/2022.coling-1.333
- DOI:
- Cite (ACL):
- Shaokang Zhang, Lei Jiang, and Jianlong Tan. 2022. Dynamic Nonlinear Mixup with Distance-based Sample Selection. In Proceedings of the 29th International Conference on Computational Linguistics, pages 3788–3797, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
- Cite (Informal):
- Dynamic Nonlinear Mixup with Distance-based Sample Selection (Zhang et al., COLING 2022)
- PDF:
- https://preview.aclanthology.org/ingest-acl-2023-videos/2022.coling-1.333.pdf
- Data
- SST, SST-2