Mohammad AkbarTajari


2022

Parameter-efficient fine-tuning has garnered lots of attention in recent studies.On this subject, we investigate the capability of different transformer modules in transferring knowledge from a pre-trained model to a downstream task. Our empirical results suggest that every transformer module is a winning ticket such that fine-tuning the specific module while the rest of the network is frozen achieves a comparable performance to the full fine-tuning case. Among different modules in LMs, LayerNorms exhibit a significant capacity for transfer learning to the extent that with only 0.003% updateable parameters in the layer-wise analysis, they can show acceptable performance on various target tasks.We argue that the performance of LayerNorms could be attributed to their high-magnitude weights compared to other components in a pre-trained model.