Robust Gradient Descent via Moment Encoding and LDPC Codes
Abstract
This paper considers the problem of implementing large-scale gradient descent algorithms in a distributed computing setting in the presence of straggling processors. To mitigate the effect of the stragglers, it has been previously proposed to encode the data with an erasure-correcting code and decode at the master server at the end of the computation. We, instead, propose to encode the second-moment of the data with a low density parity-check (LDPC) code. The iterative decoding algorithms for LDPC codes have very low computational overhead and the number of decoding iterations can be made to automatically adjust with the number of stragglers in the system. For a random model for stragglers, we obtain the convergence guarantees for the proposed solution by viewing it as the stochastic gradient descent method. Furthermore, the proposed solution outperforms the existing schemes in a real distributed computing setup.