Google Research

From Data to Models and Back

ACM

Abstract

Production ML is more than writing the code for the trainer. It requires processes and tooling that enable a larger team to share, track, analyze, and monitor not only on the code for ML but also on the artifacts (Datasets, Models, ...) that are manipulated and generated in these production ML pipelines. In this paper we describe the tools we developed at Google for the analysis and validation of two of the most important types of artifacts: Datasets and Models. These tools are currently deployed in production at Google and other large organizations. Our approach is heavily inspired by well-known principles of data-management systems. Ultimately, we want to enable users to trust their data and models, and understand how data properties affect the quality of the generated ML models.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work