MapReduce/Bigtable for Distributed Optimization

Keith B. Hall
Scott Gilpin
Gideon Mann
Neural Information Processing Systems Workshop on Leaning on Cores, Clusters, and Clouds (2010)
Google Scholar

Abstract

For large data it can be very time consuming to run gradient based optimizat ion,for example to minimize the log-likelihood for maximum entropy models.Distributed methods are therefore appealing and a number of distributed gradientoptimization strategies have been proposed including: distributed gradient, asynchronousupdates, and iterative parameter mixtures. In this paper, we evaluatethese various strategies with regards to their accuracy and speed over MapReduce/Bigtable and discuss the techniques needed for high performance.

Research Areas