PIEClass: Weakly-Supervised Text Classification with Prompting and Noise-Robust Iterative Ensemble Training

Yunyi Zhang; Minhao Jiang; Yu Meng; Yu Zhang; Jiawei Han

doi:10.18653/v1/2023.emnlp-main.780

PIEClass: Weakly-Supervised Text Classification with Prompting and Noise-Robust Iterative Ensemble Training

Yunyi Zhang, Minhao Jiang, Yu Meng, Yu Zhang, Jiawei Han

Abstract

Weakly-supervised text classification trains a classifier using the label name of each target class as the only supervision, which largely reduces human annotation efforts. Most existing methods first use the label names as static keyword-based features to generate pseudo labels, which are then used for final classifier training. While reasonable, such a commonly adopted framework suffers from two limitations: (1) keywords can have different meanings in different contexts and some text may not have any keyword, so keyword matching can induce noisy and inadequate pseudo labels; (2) the errors made in the pseudo label generation stage will directly propagate to the classifier training stage without a chance of being corrected. In this paper, we propose a new method, PIEClass, consisting of two modules: (1) a pseudo label acquisition module that uses zero-shot prompting of pre-trained language models (PLM) to get pseudo labels based on contextualized text understanding beyond static keyword matching, and (2) a noise-robust iterative ensemble training module that iteratively trains classifiers and updates pseudo labels by utilizing two PLM fine-tuning methods that regularize each other. Extensive experiments show that PIEClass achieves overall better performance than existing strong baselines on seven benchmark datasets and even achieves similar performance to fully-supervised classifiers on sentiment classification tasks.

Anthology ID:: 2023.emnlp-main.780
Volume:: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12655–12670
Language:
URL:: https://preview.aclanthology.org/add-emnlp-2024-awards/2023.emnlp-main.780/
DOI:: 10.18653/v1/2023.emnlp-main.780
Bibkey:
Cite (ACL):: Yunyi Zhang, Minhao Jiang, Yu Meng, Yu Zhang, and Jiawei Han. 2023. PIEClass: Weakly-Supervised Text Classification with Prompting and Noise-Robust Iterative Ensemble Training. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12655–12670, Singapore. Association for Computational Linguistics.
Cite (Informal):: PIEClass: Weakly-Supervised Text Classification with Prompting and Noise-Robust Iterative Ensemble Training (Zhang et al., EMNLP 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/add-emnlp-2024-awards/2023.emnlp-main.780.pdf
Video:: https://preview.aclanthology.org/add-emnlp-2024-awards/2023.emnlp-main.780.mp4

PDF Cite Search Video Fix data