Abstract
Activation functions play a crucial role in neural networks because they are the nonlinearities which have been attributed to the success story of deep learning. One of the currently most popular activation functions is ReLU, but several competitors have recently been proposed or ‘discovered’, including LReLU functions and swish. While most works compare newly proposed activation functions on few tasks (usually from image classification) and against few competitors (usually ReLU), we perform the first largescale comparison of 21 activation functions across eight different NLP tasks. We find that a largely unknown activation function performs most stably across all tasks, the so-called penalized tanh function. We also show that it can successfully replace the sigmoid and tanh gates in LSTM cells, leading to a 2 percentage point (pp) improvement over the standard choices on a challenging NLP task.- Anthology ID:
- D18-1472
- Volume:
- Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
- Month:
- October-November
- Year:
- 2018
- Address:
- Brussels, Belgium
- Editors:
- Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
- Venue:
- EMNLP
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4415–4424
- Language:
- URL:
- https://aclanthology.org/D18-1472
- DOI:
- 10.18653/v1/D18-1472
- Cite (ACL):
- Steffen Eger, Paul Youssef, and Iryna Gurevych. 2018. Is it Time to Swish? Comparing Deep Learning Activation Functions Across NLP tasks. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4415–4424, Brussels, Belgium. Association for Computational Linguistics.
- Cite (Informal):
- Is it Time to Swish? Comparing Deep Learning Activation Functions Across NLP tasks (Eger et al., EMNLP 2018)
- PDF:
- https://preview.aclanthology.org/ml4al-ingestion/D18-1472.pdf
- Code
- UKPLab/emnlp2018-activation-functions