Abstract
We propose a semi-supervised text classifier based on self-training using one positive and one negative property of neural networks. One of the weaknesses of self-training is the semantic drift problem, where noisy pseudo-labels accumulate over iterations and consequently the error rate soars. In order to tackle this challenge, we reshape the role of pseudo-labels and create a hierarchical order of information. In addition, a crucial step in self-training is to use the classifier confidence prediction to select the best candidate pseudo-labels. This step cannot be efficiently done by neural networks, because it is known that their output is poorly calibrated. To overcome this challenge, we propose a hybrid metric to replace the plain confidence measurement. Our metric takes into account the prediction uncertainty via a subsampling technique. We evaluate our model in a set of five standard benchmarks, and show that it significantly outperforms a set of ten diverse baseline models. Furthermore, we show that the improvement achieved by our model is additive to language model pretraining, which is a widely used technique for using unlabeled documents.- Anthology ID:
- 2023.findings-acl.769
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2023
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Editors:
- Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 12148–12162
- Language:
- URL:
- https://aclanthology.org/2023.findings-acl.769
- DOI:
- 10.18653/v1/2023.findings-acl.769
- Cite (ACL):
- Payam Karisani. 2023. Neural Networks Against (and For) Self-Training: Classification with Small Labeled and Large Unlabeled Sets. In Findings of the Association for Computational Linguistics: ACL 2023, pages 12148–12162, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- Neural Networks Against (and For) Self-Training: Classification with Small Labeled and Large Unlabeled Sets (Karisani, Findings 2023)
- PDF:
- https://preview.aclanthology.org/emnlp22-frontmatter/2023.findings-acl.769.pdf