Underspecification Presents Challenges for Credibility in Modern Machine Learning

Alexander Nicholas D'Amour

Katherine Heller

Dan Moldovan

Ben Adlam

Babak Alipanahi

Alex Beutel

Christina Chen

Jon Deaton

Jacob Eisenstein

Matthew D. Hoffman

Farhad Hormozdiari

Shaobo Hou

Neil Houlsby

Ghassen Jerfel

Alan Karthikesalingam

Mario Lučić

Yian Ma

Cory McLean

Diana Mincu

Akinori Mitani

Andrea Montanari

Zachary Nado

Vivek Natarajan

Christopher Nielsen

Thomas Osborne

Rajiv Raman

Kim Ramasamy

Rory Abbott Sayres

Jessica Schrouff

Martin Gamunu Seneviratne

Shannon Sequeira

Harini Suresh

Victor Veitch

Max Vladymyrov

Xuezhi Wang

Kellie Webster

Steve Yadlowsky

Taedong Yun

Xiaohua Zhai

D. Sculley

Journal of Machine Learning Research (2020)

Download Google Scholar

Abstract

ML models often exhibit unexpectedly poor behavior when they are deployed in real-world domains. We identify underspecification as a key reason for these failures. An ML pipeline is underspecified when it can return many predictors with equivalently strong held-out performance in the training domain. Underspecification is common in modern ML pipelines, such as those based on deep learning. Predictors returned by underspecified pipelines are often treated as equivalent based on their training domain performance, but we show here that such predictors can behave very differently in deployment domains. This ambiguity can lead to instability and poor model behavior in practice, and is a distinct failure mode from previously identified issues arising from structural mismatch between training and deployment domains. We show that this problem appears in a wide variety of practical ML pipelines, using examples from computer vision, medical imaging, natural language processing, clinical risk prediction based on electronic health records, and medical genomics. Our results show the need to explicitly account for underspecification in modeling pipelines that are intended for real-world deployment in any domain.

Research Areas

Machine Intelligence

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Underspecification Presents Challenges for Credibility in Modern Machine Learning

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Underspecification Presents Challenges for Credibility in Modern Machine Learning

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities