Abstract
Text classification is a widely studied problem and has broad applications. In many real-world problems, the number of texts for training classification models is limited, which renders these models prone to overfitting. To address this problem, we propose SSL-Reg, a data-dependent regularization approach based on self-supervised learning (SSL). SSL (Devlin et al., 2019a) is an unsupervised learning approach that defines auxiliary tasks on input data without using any human-provided labels and learns data representations by solving these auxiliary tasks. In SSL-Reg, a supervised classification task and an unsupervised SSL task are performed simultaneously. The SSL task is unsupervised, which is defined purely on input texts without using any human- provided labels. Training a model using an SSL task can prevent the model from being overfitted to a limited number of class labels in the classification task. Experiments on 17 text classification datasets demonstrate the effectiveness of our proposed method. Code is available at https://github.com/UCSD-AI4H/SSReg.- Anthology ID:
- 2021.tacl-1.39
- Volume:
- Transactions of the Association for Computational Linguistics, Volume 9
- Month:
- Year:
- 2021
- Address:
- Cambridge, MA
- Editors:
- Brian Roark, Ani Nenkova
- Venue:
- TACL
- SIG:
- Publisher:
- MIT Press
- Note:
- Pages:
- 641–656
- Language:
- URL:
- https://aclanthology.org/2021.tacl-1.39
- DOI:
- 10.1162/tacl_a_00389
- Cite (ACL):
- Meng Zhou, Zechen Li, and Pengtao Xie. 2021. Self-supervised Regularization for Text Classification. Transactions of the Association for Computational Linguistics, 9:641–656.
- Cite (Informal):
- Self-supervised Regularization for Text Classification (Zhou et al., TACL 2021)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2021.tacl-1.39.pdf