Transformers with Learnable Activation Functions

Haishuo Fang, Ji-Ung Lee, Nafise Sadat Moosavi, Iryna Gurevych


Abstract
Activation functions can have a significant impact on reducing the topological complexity of input data and therefore, improving a model’s performance. However, the choice of activation functions is seldom discussed or explored in Transformer-based language models. As a common practice, commonly used activation functions like Gaussian Error Linear Unit (GELU) are chosen beforehand and then remain fixed from pre-training to fine-tuning. In this paper, we investigate the impact of activation functions on Transformer-based models by utilizing rational activation functions (RAFs). In contrast to fixed activation functions (FAF), RAFs are capable of learning the optimal activation functions from data. Our experiments show that the RAF-based Transformer model (RAFT) achieves a better performance than its FAF-based counterpart (). For instance, we find that RAFT outperforms on the GLUE benchmark by 5.71 points when using only 100 training examples and by 2.05 points on SQuAD with all available data. Analyzing the shapes of the learned RAFs further unveils that they vary across different layers and different tasks; opening a promising way to better analyze and understand large, pre-trained language models.
Anthology ID:
2023.findings-eacl.181
Volume:
Findings of the Association for Computational Linguistics: EACL 2023
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Andreas Vlachos, Isabelle Augenstein
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2382–2398
Language:
URL:
https://aclanthology.org/2023.findings-eacl.181
DOI:
10.18653/v1/2023.findings-eacl.181
Bibkey:
Cite (ACL):
Haishuo Fang, Ji-Ung Lee, Nafise Sadat Moosavi, and Iryna Gurevych. 2023. Transformers with Learnable Activation Functions. In Findings of the Association for Computational Linguistics: EACL 2023, pages 2382–2398, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Transformers with Learnable Activation Functions (Fang et al., Findings 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-5/2023.findings-eacl.181.pdf
Video:
 https://preview.aclanthology.org/nschneid-patch-5/2023.findings-eacl.181.mp4