The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction

Eric Breck

Shanqing Cai

Eric Nielsen

Michael Salib

D. Sculley

Proceedings of IEEE Big Data(2017)

Download Google Scholar

Abstract

Creating reliable, production-level machine learning systems brings on a host of concerns not found in small toy examples or even large offline research experiments. Testing and monitoring are key considerations for ensuring the production-readiness of an ML system, and for reducing technical debt of ML systems. But it can be difficult to formulate specific tests, given that the actual prediction behavior of any given model is difficult to specify a priori. In this paper, we present 28 specific tests and monitoring needs, drawn from experience with a wide range of production ML systems to help quantify these issues and present an easy to follow road-map to improve production readiness and pay down ML technical debt.

Research Areas

Software Engineering
Machine Intelligence

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities