Tutorial: New Templates for Scalable Data Analysis

Amr Ahmed; Alex Smola; Markus Weimer

Tutorial: New Templates for Scalable Data Analysis

Amr Ahmed

Alex Smola

Markus Weimer

The 21st International World Wide Web conference (WWW) (2012)

Download Google Scholar

Abstract

Scalable data analysis has come a long way since the intro-
duction of the MapReduce paradigm a decade ago. In this
tutorial we present algorithms for synchronous and asyn-
chronous data processing. They are are capable of dealing
with the amounts of data typically available on the internet.
We given a brief description of the problems one faces
when performing scalable machine learning on the inter-
net. To motivate matters we provide a number of scenarios
from spam ltering, advertising and collaborative ltering.
This is followed by an extensive discussion of current and
novel synchronous data processing techniques. In particu-
lar we emphasize how insights from systems research and
databases can be used to achieve signicant improvements
both in terms of expressiveness and in terms of eciency of
the deployed algorithms.
This is followed by a description of asynchronous data
analysis and inference methods. The latter are particularly
necessary whenever the estimation problem requires the use
of a signicant number of latent variables. This includes
cases such as clustering, topic models, or graph factoriza-
tion. We provide an ample number of motivating examples
and applications, ranging from user proling to the analysis
of communication networks. Special emphasis is placed on
approximations needed to scale algorithms to hundreds of
millions of users and billions of documents.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Tutorial: New Templates for Scalable Data Analysis

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs