Weakly Supervised Text Classification using Supervision Signals from a Language Model
Ziqian Zeng, Weimin Ni, Tianqing Fang, Xiang Li, Xinran Zhao, Yangqiu Song
Abstract
Solving text classification in a weakly supervised manner is important for real-world applications where human annotations are scarce. In this paper, we propose to query a masked language model with cloze style prompts to obtain supervision signals. We design a prompt which combines the document itself and “this article is talking about [MASK].” A masked language model can generate words for the [MASK] token. The generated words which summarize the content of a document can be utilized as supervision signals. We propose a latent variable model to learn a word distribution learner which associates generated words to pre-defined categories and a document classifier simultaneously without using any annotated data. Evaluation on three datasets, AGNews, 20Newsgroups, and UCINews, shows that our method can outperform baselines by 2%, 4%, and 3%.- Anthology ID:
- 2022.findings-naacl.176
- Volume:
- Findings of the Association for Computational Linguistics: NAACL 2022
- Month:
- July
- Year:
- 2022
- Address:
- Seattle, United States
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2295–2305
- Language:
- URL:
- https://aclanthology.org/2022.findings-naacl.176
- DOI:
- 10.18653/v1/2022.findings-naacl.176
- Cite (ACL):
- Ziqian Zeng, Weimin Ni, Tianqing Fang, Xiang Li, Xinran Zhao, and Yangqiu Song. 2022. Weakly Supervised Text Classification using Supervision Signals from a Language Model. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 2295–2305, Seattle, United States. Association for Computational Linguistics.
- Cite (Informal):
- Weakly Supervised Text Classification using Supervision Signals from a Language Model (Zeng et al., Findings 2022)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2022.findings-naacl.176.pdf
- Code
- hkust-knowcomp/wddc
- Data
- AG News