Task-adaptive Pre-training and Self-training are Complementary for Natural Language Understanding

Shiyang Li; Semih Yavuz; Wenhu Chen; Xifeng Yan

doi:10.18653/v1/2021.findings-emnlp.86

Task-adaptive Pre-training and Self-training are Complementary for Natural Language Understanding

Shiyang Li, Semih Yavuz, Wenhu Chen, Xifeng Yan

Abstract

Task-adaptive pre-training (TAPT) and Self-training (ST) have emerged as the major semi-supervised approaches to improve natural language understanding (NLU) tasks with massive amount of unlabeled data. However, it’s unclear whether they learn similar representations or they can be effectively combined. In this paper, we show that TAPT and ST can be complementary with simple TFS protocol by following TAPT -> Finetuning -> Self-training (TFS) process. Experimental results show that TFS protocol can effectively utilize unlabeled data to achieve strong combined gains consistently across six datasets covering sentiment classification, paraphrase identification, natural language inference, named entity recognition and dialogue slot classification. We investigate various semi-supervised settings and consistently show that gains from TAPT and ST can be strongly additive by following TFS procedure. We hope that TFS could serve as an important semi-supervised baseline for future NLP studies.

Anthology ID:: 2021.findings-emnlp.86
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2021
Month:: November
Year:: 2021
Address:: Punta Cana, Dominican Republic
Editors:: Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:: Findings
SIG:: SIGDAT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1006–1015
Language:
URL:: https://aclanthology.org/2021.findings-emnlp.86
DOI:: 10.18653/v1/2021.findings-emnlp.86
Bibkey:
Cite (ACL):: Shiyang Li, Semih Yavuz, Wenhu Chen, and Xifeng Yan. 2021. Task-adaptive Pre-training and Self-training are Complementary for Natural Language Understanding. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 1006–1015, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):: Task-adaptive Pre-training and Self-training are Complementary for Natural Language Understanding (Li et al., Findings 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/proper-vol2-ingestion/2021.findings-emnlp.86.pdf
Video:: https://preview.aclanthology.org/proper-vol2-ingestion/2021.findings-emnlp.86.mp4
Data: CoNLL 2003, GLUE, MultiNLI, QNLI, SST, SST-2

PDF Search Video