Low-resource Interactive Active Labeling for Fine-tuning Language Models

Seiji Maekawa, Dan Zhang, Hannah Kim, Sajjadur Rahman, Estevam Hruschka


Abstract
Recently, active learning (AL) methods have been used to effectively fine-tune pre-trained language models for various NLP tasks such as sentiment analysis and document classification. However, given the task of fine-tuning language models, understanding the impact of different aspects on AL methods such as labeling cost, sample acquisition latency, and the diversity of the datasets necessitates a deeper investigation. This paper examines the performance of existing AL methods within a low-resource, interactive labeling setting. We observe that existing methods often underperform in such a setting while exhibiting higher latency and a lack of generalizability. To overcome these challenges, we propose a novel active learning method TYROUGE that employs a hybrid sampling strategy to minimize labeling cost and acquisition latency while providing a framework for adapting to dataset diversity via user guidance. Through our experiments, we observe that compared to SOTA methods, TYROUGE reduces the labeling cost by up to 43% and the acquisition latency by as much as 11X, while achieving comparable accuracy. Finally, we discuss the strengths and weaknesses of TYROUGE by exploring the impact of dataset characteristics.
Anthology ID:
2022.findings-emnlp.235
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2022
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3230–3242
Language:
URL:
https://aclanthology.org/2022.findings-emnlp.235
DOI:
Bibkey:
Cite (ACL):
Seiji Maekawa, Dan Zhang, Hannah Kim, Sajjadur Rahman, and Estevam Hruschka. 2022. Low-resource Interactive Active Labeling for Fine-tuning Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 3230–3242, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Low-resource Interactive Active Labeling for Fine-tuning Language Models (Maekawa et al., Findings 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.findings-emnlp.235.pdf