Abstract
Despite their recent successes in tackling many NLP tasks, large-scale pre-trained language models do not perform as well in few-shot settings where only a handful of training examples are available. To address this shortcoming, we propose STraTA, which stands for Self-Training with Task Augmentation, an approach that builds on two key ideas for effective leverage of unlabeled data. First, STraTA uses task augmentation, a novel technique that synthesizes a large amount of data for auxiliary-task fine-tuning from target-task unlabeled texts. Second, STraTA performs self-training by further fine-tuning the strong base model created by task augmentation on a broad distribution of pseudo-labeled data. Our experiments demonstrate that STraTA can substantially improve sample efficiency across 12 few-shot benchmarks. Remarkably, on the SST-2 sentiment dataset, STraTA, with only 8 training examples per class, achieves comparable results to standard fine-tuning with 67K training examples. Our analyses reveal that task augmentation and self-training are both complementary and independently effective.- Anthology ID:
- 2021.emnlp-main.462
- Volume:
- Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2021
- Address:
- Online and Punta Cana, Dominican Republic
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 5715–5731
- Language:
- URL:
- https://aclanthology.org/2021.emnlp-main.462
- DOI:
- 10.18653/v1/2021.emnlp-main.462
- Cite (ACL):
- Tu Vu, Minh-Thang Luong, Quoc Le, Grady Simon, and Mohit Iyyer. 2021. STraTA: Self-Training with Task Augmentation for Better Few-shot Learning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5715–5731, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- STraTA: Self-Training with Task Augmentation for Better Few-shot Learning (Vu et al., EMNLP 2021)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/2021.emnlp-main.462.pdf
- Code
- google-research/google-research
- Data
- GLUE, MRPC, MultiNLI, QNLI, SNLI, SST