L3Masking: Multi-task Fine-tuning for Language Models by Leveraging Lessons Learned from Vanilla Models

Yusuke Kimura; Takahiro Komamizu; Kenji Hatano

doi:10.18653/v1/2024.customnlp4u-1.6

L3Masking: Multi-task Fine-tuning for Language Models by Leveraging Lessons Learned from Vanilla Models

Yusuke Kimura, Takahiro Komamizu, Kenji Hatano

Abstract

When distributional differences exist between pre-training and fine-tuning data, language models (LMs) may perform poorly on downstream tasks.Recent studies have reported that multi-task learning of downstream task and masked language modeling (MLM) task during the fine-tuning phase improves the performance of the downstream task.Typical MLM tasks (e.g., random token masking (RTM)) tend not to care tokens corresponding to the knowledge already acquired during the pre-training phase, therefore LMs may not notice the important clue or not effective to acquire linguistic knowledge of the task or domain.To overcome this limitation, we propose a new masking strategy for MLM task, called L3Masking, that leverages lessons (specifically, token-wise likelihood in a context) learned from the vanilla language model to be fine-tuned.L3Masking actively masks tokens with low likelihood on the vanilla model.Experimental evaluations on text classification tasks in different domains confirms a multi-task text classification method with L3Masking performed task adaptation more effectively than that with RTM.These results suggest the usefulness of assigning a preference to the tokens to be learned as the task or domain adaptation.

Anthology ID:: 2024.customnlp4u-1.6
Volume:: Proceedings of the 1st Workshop on Customizable NLP: Progress and Challenges in Customizing NLP for a Domain, Application, Group, or Individual (CustomNLP4U)
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Sachin Kumar, Vidhisha Balachandran, Chan Young Park, Weijia Shi, Shirley Anugrah Hayati, Yulia Tsvetkov, Noah Smith, Hannaneh Hajishirzi, Dongyeop Kang, David Jurgens
Venue:: CustomNLP4U
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 53–62
Language:
URL:: https://preview.aclanthology.org/add_missing_videos/2024.customnlp4u-1.6/
DOI:: 10.18653/v1/2024.customnlp4u-1.6
Bibkey:
Cite (ACL):: Yusuke Kimura, Takahiro Komamizu, and Kenji Hatano. 2024. L3Masking: Multi-task Fine-tuning for Language Models by Leveraging Lessons Learned from Vanilla Models. In Proceedings of the 1st Workshop on Customizable NLP: Progress and Challenges in Customizing NLP for a Domain, Application, Group, or Individual (CustomNLP4U), pages 53–62, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: L3Masking: Multi-task Fine-tuning for Language Models by Leveraging Lessons Learned from Vanilla Models (Kimura et al., CustomNLP4U 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/add_missing_videos/2024.customnlp4u-1.6.pdf

PDF Search Fix data