Google Research

A Tale of Two Long Tails

  • Daniel D{'}Souza
  • Zach Nussbaum
  • Chirag Agarwal
  • Sara Hooker


As machine learning models are increasingly employed to assist human decision-makers, it becomes critical to communicate the uncertainty associated with these model predictions. However, the majority of work on uncertainty has focused on traditional probabilistic or ranking approaches -- where the model assigns low probabilities or scores to uncertain examples. While this captures what are challenging predictions for the model, it does \emph{not} capture the underlying source of the uncertainty. In this work, we seek to identify examples the model is uncertain about \emph{and} the source of said uncertainty. This requires characterizing the difference between examples dominated by \emph{epistemic} and \emph{aleatoric} uncertainty. Our targeted, adaptive intervention measures the change in relative uncertainty in the presence of additional information. We show this is an effective way to characterize both example level uncertainty and the source of said uncertainty. This has important downstream applicability, as the remedies for noisy and atypical examples are radically different. Typically, a practitioner wants to upweight atypical examples and downweight or remove noisy examples entirely.

Research Areas

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work