Data Management Challenges in Production Machine Learning

Alkis Polyzotis; Martin A. Zinkevich; Steven Whang; Sudip Roy

Data Management Challenges in Production Machine Learning

Alkis Polyzotis

Martin A. Zinkevich

Steven Whang

Sudip Roy

Proceedings of the 2017 ACM International Conference on Management of Data, ACM, New York, NY, USA, pp. 1723-1726

Download Google Scholar

Abstract

This tutorial discusses data-management issues that
arise in the context of production ML pipelines. Informed
by our own experience with such large-scale pipelines, we
focus on issues related to validating, debugging, cleaning,
understanding, and enriching training data. The goal of the
tutorial is to bring forth these issues, draw connections to
prior work in the database literature, and outline the open
research questions that are not addressed by prior art. We
believe that the data management community is well positioned
to address these issues and we hope to motivate the
audience to look more closely in this area.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Data Management Challenges in Production Machine Learning

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs