Abstract
Existing asynchronous parallel learning methods are only for the sparse feature models, and they face new challenges for the dense feature models like neural networks (e.g., LSTM, RNN). The problem for dense features is that asynchronous parallel learning brings gradient errors derived from overwrite actions. We show that gradient errors are very common and inevitable. Nevertheless, our theoretical analysis shows that the learning process with gradient errors can still be convergent towards the optimum of objective functions for many practical applications. Thus, we propose a simple method AsynGrad for asynchronous parallel learning with gradient error. Base on various dense feature models (LSTM, dense-CRF) and various NLP tasks, experiments show that AsynGrad achieves substantial improvement on training speed, and without any loss on accuracy.- Anthology ID:
- C16-1019
- Volume:
- Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
- Month:
- December
- Year:
- 2016
- Address:
- Osaka, Japan
- Venue:
- COLING
- SIG:
- Publisher:
- The COLING 2016 Organizing Committee
- Note:
- Pages:
- 192–202
- Language:
- URL:
- https://aclanthology.org/C16-1019
- DOI:
- Cite (ACL):
- Xu Sun. 2016. Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 192–202, Osaka, Japan. The COLING 2016 Organizing Committee.
- Cite (Informal):
- Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features (Sun, COLING 2016)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/C16-1019.pdf