To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks

Matthew E. Peters, Sebastian Ruder, Noah A. Smith


Abstract
While most previous work has focused on different pretraining objectives and architectures for transfer learning, we ask how to best adapt the pretrained model to a given target task. We focus on the two most common forms of adaptation, feature extraction (where the pretrained weights are frozen), and directly fine-tuning the pretrained model. Our empirical results across diverse NLP tasks with two state-of-the-art models show that the relative performance of fine-tuning vs. feature extraction depends on the similarity of the pretraining and target tasks. We explore possible explanations for this finding and provide a set of adaptation guidelines for the NLP practitioner.
Anthology ID:
W19-4302
Volume:
Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Isabelle Augenstein, Spandana Gella, Sebastian Ruder, Katharina Kann, Burcu Can, Johannes Welbl, Alexis Conneau, Xiang Ren, Marek Rei
Venue:
RepL4NLP
SIG:
SIGREP
Publisher:
Association for Computational Linguistics
Note:
Pages:
7–14
Language:
URL:
https://preview.aclanthology.org/icon-24-ingestion/W19-4302/
DOI:
10.18653/v1/W19-4302
Bibkey:
Cite (ACL):
Matthew E. Peters, Sebastian Ruder, and Noah A. Smith. 2019. To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks. In Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), pages 7–14, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks (Peters et al., RepL4NLP 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/icon-24-ingestion/W19-4302.pdf
Data
CoNLL 2003GLUEMRPCMultiNLISICKSSTSST-2