What’s Hidden in a One-layer Randomly Weighted Transformer?

Sheng Shen, Zhewei Yao, Douwe Kiela, Kurt Keutzer, Michael Mahoney


Abstract
We demonstrate that, hidden within one-layer randomly weighted neural networks, there exist subnetworks that can achieve impressive performance, without ever modifying the weight initializations, on machine translation tasks. To find subnetworks for one-layer randomly weighted neural networks, we apply different binary masks to the same weight matrix to generate different layers. Hidden within a one-layer randomly weighted Transformer, we find that subnetworks that can achieve 29.45/17.29 BLEU on IWSLT14/WMT14. Using a fixed pre-trained embedding layer, the previously found subnetworks are smaller than, but can match 98%/92% (34.14/25.24 BLEU) of the performance of, a trained Transformersmall/base on IWSLT14/WMT14. Furthermore, we demonstrate the effectiveness of larger and deeper transformers in this setting, as well as the impact of different initialization methods.
Anthology ID:
2021.emnlp-main.231
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2914–2921
Language:
URL:
https://aclanthology.org/2021.emnlp-main.231
DOI:
10.18653/v1/2021.emnlp-main.231
Bibkey:
Cite (ACL):
Sheng Shen, Zhewei Yao, Douwe Kiela, Kurt Keutzer, and Michael Mahoney. 2021. What’s Hidden in a One-layer Randomly Weighted Transformer?. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 2914–2921, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
What’s Hidden in a One-layer Randomly Weighted Transformer? (Shen et al., EMNLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/paclic-22-ingestion/2021.emnlp-main.231.pdf
Video:
 https://preview.aclanthology.org/paclic-22-ingestion/2021.emnlp-main.231.mp4
Code
 sincerass/one_layer_lottery_ticket
Data
MultiNLIWMT 2014