Contextualized Weak Supervision for Text Classification

Dheeraj Mekala, Jingbo Shang


Abstract
Weakly supervised text classification based on a few user-provided seed words has recently attracted much attention from researchers. Existing methods mainly generate pseudo-labels in a context-free manner (e.g., string matching), therefore, the ambiguous, context-dependent nature of human language has been long overlooked. In this paper, we propose a novel framework ConWea, providing contextualized weak supervision for text classification. Specifically, we leverage contextualized representations of word occurrences and seed word information to automatically differentiate multiple interpretations of the same word, and thus create a contextualized corpus. This contextualized corpus is further utilized to train the classifier and expand seed words in an iterative manner. This process not only adds new contextualized, highly label-indicative keywords but also disambiguates initial seed words, making our weak supervision fully contextualized. Extensive experiments and case studies on real-world datasets demonstrate the necessity and significant advantages of using contextualized weak supervision, especially when the class labels are fine-grained.
Anthology ID:
2020.acl-main.30
Volume:
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2020
Address:
Online
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
323–333
Language:
URL:
https://aclanthology.org/2020.acl-main.30
DOI:
10.18653/v1/2020.acl-main.30
Bibkey:
Cite (ACL):
Dheeraj Mekala and Jingbo Shang. 2020. Contextualized Weak Supervision for Text Classification. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 323–333, Online. Association for Computational Linguistics.
Cite (Informal):
Contextualized Weak Supervision for Text Classification (Mekala & Shang, ACL 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/2020.acl-main.30.pdf
Video:
 http://slideslive.com/38929298
Code
 dheeraj7596/ConWea