HyperMixer: An MLP-based Low Cost Alternative to Transformers
Florian Mai, Arnaud Pannatier, Fabio Fehr, Haolin Chen, Francois Marelli, Francois Fleuret, James Henderson
Abstract
Transformer-based architectures are the model of choice for natural language understanding, but they come at a significant cost, as they have quadratic complexity in the input length, require a lot of training data, and can be difficult to tune. In the pursuit of lower costs, we investigate simple MLP-based architectures. We find that existing architectures such as MLPMixer, which achieves token mixing through a static MLP applied to each feature independently, are too detached from the inductive biases required for natural language understanding. In this paper, we propose a simple variant, HyperMixer, which forms the token mixing MLP dynamically using hypernetworks. Empirically, we demonstrate that our model performs better than alternative MLP-based models, and on par with Transformers. In contrast to Transformers, HyperMixer achieves these results at substantially lower costs in terms of processing time, training data, and hyperparameter tuning.- Anthology ID:
- 2023.acl-long.871
- Volume:
- Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 15632–15654
- Language:
- URL:
- https://aclanthology.org/2023.acl-long.871
- DOI:
- Cite (ACL):
- Florian Mai, Arnaud Pannatier, Fabio Fehr, Haolin Chen, Francois Marelli, Francois Fleuret, and James Henderson. 2023. HyperMixer: An MLP-based Low Cost Alternative to Transformers. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15632–15654, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- HyperMixer: An MLP-based Low Cost Alternative to Transformers (Mai et al., ACL 2023)
- PDF:
- https://preview.aclanthology.org/starsem-semeval-split/2023.acl-long.871.pdf