Approximating Two-Layer Feedforward Networks for Efficient Transformers

Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber


Abstract
How to reduce compute and memory requirements of neural networks (NNs) without sacrificing performance? Many recent works use sparse Mixtures of Experts (MoEs) to build resource-efficient large language models (LMs). Here we introduce several novel perspectives on MoEs, presenting a general framework that *unifies* various methods to *approximate two-layer NNs* (e.g., feedforward blocks of Transformers), including product-key memories (PKMs). Leveraging insights from this framework, we propose methods to improve both MoEs and PKMs. Unlike prior work that compares MoEs with dense baselines under the *compute-equal* condition, our evaluation condition is *parameter-equal*, which is crucial to properly evaluate LMs. We show that our MoEs are competitive with the *dense* Transformer-XL on both the WikiText-103 and enwiki8 datasets at two different scales, while being much more resource efficient. This demonstrates that MoEs are relevant not only to extremely large LMs but also to any-scale resource-efficient LMs. Our code is public.
Anthology ID:
2023.findings-emnlp.49
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
674–692
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.49
DOI:
10.18653/v1/2023.findings-emnlp.49
Bibkey:
Cite (ACL):
Róbert Csordás, Kazuki Irie, and Jürgen Schmidhuber. 2023. Approximating Two-Layer Feedforward Networks for Efficient Transformers. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 674–692, Singapore. Association for Computational Linguistics.
Cite (Informal):
Approximating Two-Layer Feedforward Networks for Efficient Transformers (Csordás et al., Findings 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/2023.findings-emnlp.49.pdf