Muppet: Massive Multi-task Representations with Pre-Finetuning

Armen Aghajanyan; Anchit Gupta; Akshat Shrivastava; Xilun Chen; Luke Zettlemoyer; Sonal Gupta

doi:10.18653/v1/2021.emnlp-main.468

Muppet: Massive Multi-task Representations with Pre-Finetuning

Armen Aghajanyan, Anchit Gupta, Akshat Shrivastava, Xilun Chen, Luke Zettlemoyer, Sonal Gupta

Abstract

We propose pre-finetuning, an additional large-scale learning stage between language model pre-training and fine-tuning. Pre-finetuning is massively multi-task learning (around 50 datasets, over 4.8 million total labeled examples), and is designed to encourage learning of representations that generalize better to many different tasks. We show that pre-finetuning consistently improves performance for pretrained discriminators (e.g. RoBERTa) and generation models (e.g. BART) on a wide range of tasks (sentence prediction, commonsense reasoning, MRC, etc.), while also significantly improving sample efficiency during fine-tuning. We also show that large-scale multi-tasking is crucial; pre-finetuning can hurt performance when few tasks are used up until a critical point (usually above 15) after which performance improves linearly in the number of tasks.

Anthology ID:: 2021.emnlp-main.468
Volume:: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2021
Address:: Online and Punta Cana, Dominican Republic
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5799–5811
Language:
URL:: https://aclanthology.org/2021.emnlp-main.468
DOI:: 10.18653/v1/2021.emnlp-main.468
Bibkey:
Cite (ACL):: Armen Aghajanyan, Anchit Gupta, Akshat Shrivastava, Xilun Chen, Luke Zettlemoyer, and Sonal Gupta. 2021. Muppet: Massive Multi-task Representations with Pre-Finetuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5799–5811, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):: Muppet: Massive Multi-task Representations with Pre-Finetuning (Aghajanyan et al., EMNLP 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-script-update/2021.emnlp-main.468.pdf
Video:: https://preview.aclanthology.org/ingestion-script-update/2021.emnlp-main.468.mp4
Code: facebook/muppet-roberta-base + additional community code
Data: ANLI, BoolQ, CNN/Daily Mail, CoLA, CommonsenseQA, GLUE, HellaSwag, MultiNLI, QNLI, RACE, RTE, Reddit, Reddit TIFU, SQuAD, SST, SWAG, SuperGLUE

PDF Search Code Video