What’s Hidden in a One-layer Randomly Weighted Transformer?
Sheng Shen, Zhewei Yao, Douwe Kiela, Kurt Keutzer, Michael Mahoney
Abstract
We demonstrate that, hidden within one-layer randomly weighted neural networks, there exist subnetworks that can achieve impressive performance, without ever modifying the weight initializations, on machine translation tasks. To find subnetworks for one-layer randomly weighted neural networks, we apply different binary masks to the same weight matrix to generate different layers. Hidden within a one-layer randomly weighted Transformer, we find that subnetworks that can achieve 29.45/17.29 BLEU on IWSLT14/WMT14. Using a fixed pre-trained embedding layer, the previously found subnetworks are smaller than, but can match 98%/92% (34.14/25.24 BLEU) of the performance of, a trained Transformersmall/base on IWSLT14/WMT14. Furthermore, we demonstrate the effectiveness of larger and deeper transformers in this setting, as well as the impact of different initialization methods.- Anthology ID:
- 2021.emnlp-main.231
- Volume:
- Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2021
- Address:
- Online and Punta Cana, Dominican Republic
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2914–2921
- Language:
- URL:
- https://aclanthology.org/2021.emnlp-main.231
- DOI:
- 10.18653/v1/2021.emnlp-main.231
- Cite (ACL):
- Sheng Shen, Zhewei Yao, Douwe Kiela, Kurt Keutzer, and Michael Mahoney. 2021. What’s Hidden in a One-layer Randomly Weighted Transformer?. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 2914–2921, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- What’s Hidden in a One-layer Randomly Weighted Transformer? (Shen et al., EMNLP 2021)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/2021.emnlp-main.231.pdf
- Code
- sincerass/one_layer_lottery_ticket
- Data
- MultiNLI, WMT 2014