Plex: Towards Reliability using Pretrained Large Model Extensions

Dustin Tran; Jeremiah Liu; Michael W. Dusenberry; Du Phan; Mark Patrick Collier; Jie Jessie Ren; Kehang Han; Zi Wang; Zelda Mariet; Clara Huiyi Hu; Neil Band; Tim G. J. Rudner; Karan Singhal; Zachary Nado; Joost van Amersfoort; Andreas Christian Kirsch; Rodolphe Jenatton; Nithum Thain; Honglin Yuan; Kelly Buchanan; Kevin Patrick Murphy; D. Sculley; Yarin Gal; Zoubin Ghahramani; Jasper Roland Snoek; Balaji Lakshminarayanan

Plex: Towards Reliability using Pretrained Large Model Extensions

Dustin Tran

Jeremiah Liu

Michael W. Dusenberry

Du Phan

Mark Patrick Collier

Jie Jessie Ren

Kehang Han

Zi Wang

Zelda Mariet

Clara Huiyi Hu

Neil Band

Tim G. J. Rudner

Karan Singhal

Zachary Nado

Joost van Amersfoort

Andreas Christian Kirsch

Rodolphe Jenatton

Nithum Thain

Honglin Yuan

Kelly Buchanan

Kevin Patrick Murphy

D. Sculley

Yarin Gal

Zoubin Ghahramani

Jasper Roland Snoek

Balaji Lakshminarayanan

ICML 2022 Pre-training Workshop (2022)

Google Scholar

Abstract

A recent trend in artificial intelligence (AI) is the use of pretrained models for language and vision tasks, which has achieved extraordinary performance but also puzzling failures. Examining tasks that probe the model’s abilities in diverse ways is therefore critical to the field. In this paper, we explore the \emph{reliability} of models, where we define a reliable model as one that not only achieves strong predictive performance but also performs well consistently over many decision-making tasks such as uncertainty (e.g., selective prediction, open set recognition), robust generalization (e.g., accuracy and scoring rules such as log-likelihood on in- and out-of-distribution datasets), and adaptation (e.g., active learning, few-shot learning). We devise 11 types of tasks over 36 datasets in order to evaluate different aspects of reliability on both vision and language domains. To improve reliability, we developed ViT-Plex and T5-Plex, \emph{p}retrained \emph{l}arge-model \emph{ex}tensions (henceforth abbreviated as \emph{plex}) for vision and language modalities. Plex greatly improves the state-of-the-art across tasks, and as a pretrained model Plex unifies the traditional protocol of designing and tuning one model for each reliability task. We demonstrate scaling effects over model sizes and pretraining dataset sizes up to 4 billion examples. We also demonstrate Plex’s capabilities on new tasks including zero-shot open set recognition, few-shot uncertainty, and uncertainty in conversational language understanding.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Plex: Towards Reliability using Pretrained Large Model Extensions

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs