Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features

Xu Sun


Abstract
Existing asynchronous parallel learning methods are only for the sparse feature models, and they face new challenges for the dense feature models like neural networks (e.g., LSTM, RNN). The problem for dense features is that asynchronous parallel learning brings gradient errors derived from overwrite actions. We show that gradient errors are very common and inevitable. Nevertheless, our theoretical analysis shows that the learning process with gradient errors can still be convergent towards the optimum of objective functions for many practical applications. Thus, we propose a simple method AsynGrad for asynchronous parallel learning with gradient error. Base on various dense feature models (LSTM, dense-CRF) and various NLP tasks, experiments show that AsynGrad achieves substantial improvement on training speed, and without any loss on accuracy.
Anthology ID:
C16-1019
Volume:
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Month:
December
Year:
2016
Address:
Osaka, Japan
Venue:
COLING
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
192–202
Language:
URL:
https://aclanthology.org/C16-1019
DOI:
Bibkey:
Cite (ACL):
Xu Sun. 2016. Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 192–202, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features (Sun, COLING 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/paclic-22-ingestion/C16-1019.pdf