Partially Humanizing Weak Supervision: Towards a Better Low Resource Pipeline for Spoken Language Understanding

Ayush Kumar, Rishabh Tripathi, Jithendra Vepa


Abstract
Weak Supervised Learning (WSL) is a popular technique to develop machine learning models in absence of labeled training data. WSL involves training over noisy labels which are traditionally obtained from hand-engineered semantic rules and task-specific pre-trained models. Such rules offer limited coverage and generalization over tasks. On the other hand, pre-trained models are available only for limited tasks. Thus, obtaining weak labels is a bottleneck in weak supervised learning. In this work, we propose to utilize the prompting paradigm to generate weak labels for the underlying tasks. We show that task-agnostic prompts are generalizable and can be used to obtain noisy labels for different Spoken Language Understanding (SLU) tasks such as sentiment classification, disfluency detection and emotion classification. These prompts can additionally be updated with human-in-the-loop to add task-specific contexts, thus providing flexibility to design task-specific prompts. Our proposed WSL pipeline outperforms other competitive low-resource benchmarks on zero and few-shot learning by more than 4% on Macro-F1 and a conventional rule-based WSL baseline by more than 5% across all the benchmark datasets. We demonstrate that prompt-based methods save nearly 75% of time in a weak-supervised framework and generate more reliable labels for the above SLU tasks and thus can be used as a universal strategy to obtain weak labels.
Anthology ID:
2022.dash-1.9
Volume:
Proceedings of the Fourth Workshop on Data Science with Human-in-the-Loop (Language Advances)
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Hybrid)
Venue:
DaSH
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
64–73
Language:
URL:
https://aclanthology.org/2022.dash-1.9
DOI:
Bibkey:
Cite (ACL):
Ayush Kumar, Rishabh Tripathi, and Jithendra Vepa. 2022. Partially Humanizing Weak Supervision: Towards a Better Low Resource Pipeline for Spoken Language Understanding. In Proceedings of the Fourth Workshop on Data Science with Human-in-the-Loop (Language Advances), pages 64–73, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):
Partially Humanizing Weak Supervision: Towards a Better Low Resource Pipeline for Spoken Language Understanding (Kumar et al., DaSH 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.dash-1.9.pdf