Selecting Informative Contexts Improves Language Model Fine-tuning

Richard Antonello; Nicole Beckage; Javier Turek; Alexander Huth

doi:10.18653/v1/2021.acl-long.87

Selecting Informative Contexts Improves Language Model Fine-tuning

Richard Antonello, Nicole Beckage, Javier Turek, Alexander Huth

Abstract

Language model fine-tuning is essential for modern natural language processing, but is computationally expensive and time-consuming. Further, the effectiveness of fine-tuning is limited by the inclusion of training examples that negatively affect performance. Here we present a general fine-tuning method that we call information gain filtration for improving the overall training efficiency and final performance of language model fine-tuning. We define the information gain of an example as the improvement on a validation metric after training on that example. A secondary learner is then trained to approximate this quantity. During fine-tuning, this learner selects informative examples and skips uninformative ones. We show that our method has consistent improvement across datasets, fine-tuning tasks, and language model architectures. For example, we achieve a median perplexity of 54.0 on a books dataset compared to 57.3 for standard fine-tuning. We present statistical evidence that offers insight into the improvements of our method over standard fine-tuning. The generality of our method leads us to propose a new paradigm for language model fine-tuning — we encourage researchers to release pretrained secondary learners on common corpora to promote efficient and effective fine-tuning, thereby improving the performance and reducing the overall energy footprint of language model fine-tuning.

Anthology ID:: 2021.acl-long.87
Volume:: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:: August
Year:: 2021
Address:: Online
Editors:: Chengqing Zong, Fei Xia, Wenjie Li, Roberto Navigli
Venues:: ACL | IJCNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1072–1085
Language:
URL:: https://aclanthology.org/2021.acl-long.87
DOI:: 10.18653/v1/2021.acl-long.87
Bibkey:
Cite (ACL):: Richard Antonello, Nicole Beckage, Javier Turek, and Alexander Huth. 2021. Selecting Informative Contexts Improves Language Model Fine-tuning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1072–1085, Online. Association for Computational Linguistics.
Cite (Informal):: Selecting Informative Contexts Improves Language Model Fine-tuning (Antonello et al., ACL-IJCNLP 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-2023-videos/2021.acl-long.87.pdf
Video:: https://preview.aclanthology.org/ingest-acl-2023-videos/2021.acl-long.87.mp4
Data: SST, SST-2, WikiText-103, WikiText-2

PDF Search Video