LIME: Weakly-Supervised Text Classification without Seeds

Seongmin Park; Jihwa Lee

LIME: Weakly-Supervised Text Classification without Seeds

Abstract

In weakly-supervised text classification, only label names act as sources of supervision. Predominant approaches to weakly-supervised text classification utilize a two-phase framework, where test samples are first assigned pseudo-labels and are then used to train a neural text classifier. In most previous work, the pseudo-labeling step is dependent on obtaining seed words that best capture the relevance of each class label. We present LIME, a framework for weakly-supervised text classification that entirely replaces the brittle seed-word generation process with entailment-based pseudo-classification. We find that combining weakly-supervised classification and textual entailment mitigates shortcomings of both, resulting in a more streamlined and effective classification pipeline. With just an off-the-shelf textual entailment model, LIME outperforms recent baselines in weakly-supervised text classification and achieves state-of-the-art in 4 benchmarks.

Anthology ID:: 2022.coling-1.91
Volume:: Proceedings of the 29th International Conference on Computational Linguistics
Month:: October
Year:: 2022
Address:: Gyeongju, Republic of Korea
Venue:: COLING
SIG:
Publisher:: International Committee on Computational Linguistics
Note:
Pages:: 1083–1088
Language:
URL:: https://aclanthology.org/2022.coling-1.91
DOI:
Bibkey:
Cite (ACL):: Seongmin Park and Jihwa Lee. 2022. LIME: Weakly-Supervised Text Classification without Seeds. In Proceedings of the 29th International Conference on Computational Linguistics, pages 1083–1088, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):: LIME: Weakly-Supervised Text Classification without Seeds (Park & Lee, COLING 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-script-update/2022.coling-1.91.pdf
Code: seongminp/lime
Data: AG News

PDF Search Code