Semi-Supervised Text Classification with Balanced Deep Representation Distributions

Changchun Li, Ximing Li, Jihong Ouyang


Abstract
Semi-Supervised Text Classification (SSTC) mainly works under the spirit of self-training. They initialize the deep classifier by training over labeled texts; and then alternatively predict unlabeled texts as their pseudo-labels and train the deep classifier over the mixture of labeled and pseudo-labeled texts. Naturally, their performance is largely affected by the accuracy of pseudo-labels for unlabeled texts. Unfortunately, they often suffer from low accuracy because of the margin bias problem caused by the large difference between representation distributions of labels in SSTC. To alleviate this problem, we apply the angular margin loss, and perform Gaussian linear transformation to achieve balanced label angle variances, i.e., the variance of label angles of texts within the same label. More accuracy of predicted pseudo-labels can be achieved by constraining all label angle variances balanced, where they are estimated over both labeled and pseudo-labeled texts during self-training loops. With this insight, we propose a novel SSTC method, namely Semi-Supervised Text Classification with Balanced Deep representation Distributions (S2TC-BDD). To evaluate S2TC-BDD, we compare it against the state-of-the-art SSTC methods. Empirical results demonstrate the effectiveness of S2TC-BDD, especially when the labeled texts are scarce.
Anthology ID:
2021.acl-long.391
Volume:
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:
August
Year:
2021
Address:
Online
Editors:
Chengqing Zong, Fei Xia, Wenjie Li, Roberto Navigli
Venues:
ACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5044–5053
Language:
URL:
https://aclanthology.org/2021.acl-long.391
DOI:
10.18653/v1/2021.acl-long.391
Bibkey:
Cite (ACL):
Changchun Li, Ximing Li, and Jihong Ouyang. 2021. Semi-Supervised Text Classification with Balanced Deep Representation Distributions. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5044–5053, Online. Association for Computational Linguistics.
Cite (Informal):
Semi-Supervised Text Classification with Balanced Deep Representation Distributions (Li et al., ACL-IJCNLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-3/2021.acl-long.391.pdf
Video:
 https://preview.aclanthology.org/nschneid-patch-3/2021.acl-long.391.mp4
Data
AG News