Useful Confidence Measures: Beyond the Max Score

Gal Oshrat Yona

Amir Feder

Itay Laish

NeurIPS 2022 Workshop on Distribution Shifts (DistShift)(2022) (to appear)

Download Google Scholar

Abstract

An important component in deploying machine learning (ML) in safety-critic applications is having a reliable measure of confidence in the ML's predictions. For a classifier $f$ producing a probability vector $f(x)$ over the candidate classes, the confidence is typically taken to be $\max_i f(x)_i$. This approach is potentially limited, as it disregards the rest of the probability vector. In this work, we derive several confidence measures that depend on information beyond the maximum score, such as margin-based and entropy-based measures, and empirically evaluate their usefulness. We focus on NLP tasks and Transformer-based models. We show that in the "out of the box" regime (where the scores of $f$ are used as is), using only the maximum score to inform the confidence measure is highly suboptimal. In the post-processing regime (where the scores of $f$ can be improved using additional held-out data), this remains true (though the differences are less pronounced), with entropy-based confidence emerging as a surprisingly useful measure.

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Useful Confidence Measures: Beyond the Max Score

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Useful Confidence Measures: Beyond the Max Score

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities