Rethinking Why Intermediate-Task Fine-Tuning Works

Ting-Yun Chang, Chi-Jen Lu


Abstract
Supplementary Training on Intermediate Labeled-data Tasks (STILT) is a widely applied technique, which first fine-tunes the pretrained language models on an intermediate task before on the target task of interest. While STILT is able to further improve the performance of pretrained language models, it is still unclear why and when it works. Previous research shows that those intermediate tasks involving complex inference, such as commonsense reasoning, work especially well for RoBERTa-large. In this paper, we discover that the improvement from an intermediate task could be orthogonal to it containing reasoning or other complex skills — a simple real-fake discrimination task synthesized by GPT2 can benefit diverse target tasks. We conduct extensive experiments to study the impact of different factors on STILT. These findings suggest rethinking the role of intermediate fine-tuning in the STILT pipeline.
Anthology ID:
2021.findings-emnlp.61
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
706–713
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.61
DOI:
10.18653/v1/2021.findings-emnlp.61
Bibkey:
Cite (ACL):
Ting-Yun Chang and Chi-Jen Lu. 2021. Rethinking Why Intermediate-Task Fine-Tuning Works. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 706–713, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Rethinking Why Intermediate-Task Fine-Tuning Works (Chang & Lu, Findings 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2021.findings-emnlp.61.pdf
Software:
 2021.findings-emnlp.61.Software.zip
Video:
 https://preview.aclanthology.org/emnlp-22-attachments/2021.findings-emnlp.61.mp4
Code
 terarachang/Rethinking_STILT
Data
CoLAHellaSwagSWAGWiCWinoGrande