Abstract
We propose a novel and simple method for semi-supervised text classification. The method stems from the hypothesis that a classifier with pretrained word embeddings always outperforms the same classifier with randomly initialized word embeddings, as empirically observed in NLP tasks. Our method first builds two sets of classifiers as a form of model ensemble, and then initializes their word embeddings differently: one using random, the other using pretrained word embeddings. We focus on different predictions between the two classifiers on unlabeled data while following the self-training framework. We also use early-stopping in meta-epoch to improve the performance of our method. Our method, Delta-training, outperforms the self-training and the co-training framework in 4 different text classification datasets, showing robustness against error accumulation.- Anthology ID:
- D19-1347
- Volume:
- Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
- Month:
- November
- Year:
- 2019
- Address:
- Hong Kong, China
- Editors:
- Kentaro Inui, Jing Jiang, Vincent Ng, Xiaojun Wan
- Venues:
- EMNLP | IJCNLP
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3458–3463
- Language:
- URL:
- https://aclanthology.org/D19-1347
- DOI:
- 10.18653/v1/D19-1347
- Cite (ACL):
- Hwiyeol Jo and Ceyda Cinarel. 2019. Delta-training: Simple Semi-Supervised Text Classification using Pretrained Word Embeddings. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3458–3463, Hong Kong, China. Association for Computational Linguistics.
- Cite (Informal):
- Delta-training: Simple Semi-Supervised Text Classification using Pretrained Word Embeddings (Jo & Cinarel, EMNLP-IJCNLP 2019)
- PDF:
- https://preview.aclanthology.org/emnlp22-frontmatter/D19-1347.pdf
- Data
- AG News, IMDb Movie Reviews