Sibylvariant Transformations for Robust Text Classification
Fabrice Harel-Canada, Muhammad Ali Gulzar, Nanyun Peng, Miryung Kim
Abstract
The vast majority of text transformation techniques in NLP are inherently limited in their ability to expand input space coverage due to an implicit constraint to preserve the original class label. In this work, we propose the notion of sibylvariance (SIB) to describe the broader set of transforms that relax the label-preserving constraint, knowably vary the expected class, and lead to significantly more diverse input distributions. We offer a unified framework to organize all data transformations, including two types of SIB: (1) Transmutations convert one discrete kind into another, (2) Mixture Mutations blend two or more classes together. To explore the role of sibylvariance within NLP, we implemented 41 text transformations, including several novel techniques like Concept2Sentence and SentMix. Sibylvariance also enables a unique form of adaptive training that generates new input mixtures for the most confused class pairs, challenging the learner to differentiate with greater nuance. Our experiments on six benchmark datasets strongly support the efficacy of sibylvariance for generalization performance, defect detection, and adversarial robustness.- Anthology ID:
- 2022.findings-acl.140
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2022
- Month:
- May
- Year:
- 2022
- Address:
- Dublin, Ireland
- Editors:
- Smaranda Muresan, Preslav Nakov, Aline Villavicencio
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1771–1788
- Language:
- URL:
- https://aclanthology.org/2022.findings-acl.140
- DOI:
- 10.18653/v1/2022.findings-acl.140
- Cite (ACL):
- Fabrice Harel-Canada, Muhammad Ali Gulzar, Nanyun Peng, and Miryung Kim. 2022. Sibylvariant Transformations for Robust Text Classification. In Findings of the Association for Computational Linguistics: ACL 2022, pages 1771–1788, Dublin, Ireland. Association for Computational Linguistics.
- Cite (Informal):
- Sibylvariant Transformations for Robust Text Classification (Harel-Canada et al., Findings 2022)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2022.findings-acl.140.pdf
- Code
- ucla-seal/sibyl
- Data
- AG News, IMDb Movie Reviews, SST, SST-2, Yahoo! Answers