Abstract
When distributional differences exist between pre-training and fine-tuning data, language models (LMs) may perform poorly on downstream tasks.Recent studies have reported that multi-task learning of downstream task and masked language modeling (MLM) task during the fine-tuning phase improves the performance of the downstream task.Typical MLM tasks (e.g., random token masking (RTM)) tend not to care tokens corresponding to the knowledge already acquired during the pre-training phase, therefore LMs may not notice the important clue or not effective to acquire linguistic knowledge of the task or domain.To overcome this limitation, we propose a new masking strategy for MLM task, called L3Masking, that leverages lessons (specifically, token-wise likelihood in a context) learned from the vanilla language model to be fine-tuned.L3Masking actively masks tokens with low likelihood on the vanilla model.Experimental evaluations on text classification tasks in different domains confirms a multi-task text classification method with L3Masking performed task adaptation more effectively than that with RTM.These results suggest the usefulness of assigning a preference to the tokens to be learned as the task or domain adaptation.- Anthology ID:
- 2024.customnlp4u-1.6
- Volume:
- Proceedings of the 1st Workshop on Customizable NLP: Progress and Challenges in Customizing NLP for a Domain, Application, Group, or Individual (CustomNLP4U)
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Sachin Kumar, Vidhisha Balachandran, Chan Young Park, Weijia Shi, Shirley Anugrah Hayati, Yulia Tsvetkov, Noah Smith, Hannaneh Hajishirzi, Dongyeop Kang, David Jurgens
- Venue:
- CustomNLP4U
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 53–62
- Language:
- URL:
- https://preview.aclanthology.org/add_missing_videos/2024.customnlp4u-1.6/
- DOI:
- 10.18653/v1/2024.customnlp4u-1.6
- Cite (ACL):
- Yusuke Kimura, Takahiro Komamizu, and Kenji Hatano. 2024. L3Masking: Multi-task Fine-tuning for Language Models by Leveraging Lessons Learned from Vanilla Models. In Proceedings of the 1st Workshop on Customizable NLP: Progress and Challenges in Customizing NLP for a Domain, Application, Group, or Individual (CustomNLP4U), pages 53–62, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- L3Masking: Multi-task Fine-tuning for Language Models by Leveraging Lessons Learned from Vanilla Models (Kimura et al., CustomNLP4U 2024)
- PDF:
- https://preview.aclanthology.org/add_missing_videos/2024.customnlp4u-1.6.pdf