MapReduce: The programming model and practice

Jerry Zhao; Jelena Pjesivac-Grbovic

MapReduce: The programming model and practice

Jerry Zhao

Jelena Pjesivac-Grbovic

SIGMETRICS (2009)

Download Google Scholar

Abstract

Inspired by similar concepts in functional languages dated as early as 60's, Google first introduced MapReduce in 2004. Now, MapReduce has become the most popular framework for large-scale data processing at Google and it is becoming the framework of choice on many off-the-shelf clusters.

In this tutorial, we first introduce the MapReduce programming model, illustrating its power by couple of examples. We discuss the MapReduce and its relationship to MPI and DBMS. Performance is a key feature of the Google MapReduce implementation and we will discus a few techniques used to achieve this goal. Google MapReduce exploits data locality to reduce network overhead. We utilize different
scheduling techniques to ensure a job is progressing in the presence of variable system load. Finally, since failures are common in our data centers, we provide a number of failure avoidance and recovery features to ensure the job completion in such environment.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

MapReduce: The programming model and practice

Abstract

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs