Abstract
Extremely weakly-supervised text classification aims to classify texts without any labeled data, but only relying on class names as supervision. Existing works include prompt-based and seed-based methods. Prompt-based methods prompt language model with instructions, while seed-based methods generate pseudo-labels with word matching. Both of them have significant flaws, including zero-shot instability and context-dependent ambiguities. This paper introduces SetSync, which follows a new paradigm, i.e. wordset-based, which can avoid the above problems. In SetSync, a class is represented with wordsets, and pseudo-labels are generated with wordsets matching. To facilitate this, we propose to use information bottleneck to identify class-relevant wordsets. Moreover, we regard the classifier training as a hybrid learning of semi-supervised and noisy-labels, and propose a new training strategy, termed sync-denoising. Extensive experiments on 11 datasets show that SetSync outperforms all existing prompt and seed methods, exceeding SOTA by an impressive average of 8 points.- Anthology ID:
- 2024.naacl-long.397
- Volume:
- Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
- Month:
- June
- Year:
- 2024
- Address:
- Mexico City, Mexico
- Editors:
- Kevin Duh, Helena Gomez, Steven Bethard
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 7167–7179
- Language:
- URL:
- https://aclanthology.org/2024.naacl-long.397
- DOI:
- Cite (ACL):
- Lysa Xiao. 2024. Extremely Weakly-supervised Text Classification with Wordsets Mining and Sync-Denoising. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 7167–7179, Mexico City, Mexico. Association for Computational Linguistics.
- Cite (Informal):
- Extremely Weakly-supervised Text Classification with Wordsets Mining and Sync-Denoising (Xiao, NAACL 2024)
- PDF:
- https://preview.aclanthology.org/fix-volume-bibkeys/2024.naacl-long.397.pdf