Kehang Han
I joined Google Brain as an AI Resident in 2020 and now am a Research Engineer there. My research focus has been Graph Neural Networks and Bayesian methods to make deep learning models more reliable (e.g., reduce overconfidence, improve calibration) under distributional shift. Before that I was a senior data scientist at Staples working on Operations Research and Internet of Things. I obtained PhD from MIT on Chemical Engineering and Computation.
Authored Publications
Sort By
Plex: Towards Reliability using Pretrained Large Model Extensions
Du Phan
Mark Patrick Collier
Zi Wang
Zelda Mariet
Clara Huiyi Hu
Neil Band
Tim G. J. Rudner
Karan Singhal
Joost van Amersfoort
Andreas Christian Kirsch
Rodolphe Jenatton
Honglin Yuan
Kelly Buchanan
Yarin Gal
ICML 2022 Pre-training Workshop (2022)
Preview abstract
A recent trend in artificial intelligence (AI) is the use of pretrained models for language and vision tasks, which has achieved extraordinary performance but also puzzling failures. Examining tasks that probe the model’s abilities in diverse ways is therefore critical to the field. In this paper, we explore the \emph{reliability} of models, where we define a reliable model as one that not only achieves strong predictive performance but also performs well consistently over many decision-making tasks such as uncertainty (e.g., selective prediction, open set recognition), robust generalization (e.g., accuracy and scoring rules such as log-likelihood on in- and out-of-distribution datasets), and adaptation (e.g., active learning, few-shot learning). We devise 11 types of tasks over 36 datasets in order to evaluate different aspects of reliability on both vision and language domains. To improve reliability, we developed ViT-Plex and T5-Plex, \emph{p}retrained \emph{l}arge-model \emph{ex}tensions (henceforth abbreviated as \emph{plex}) for vision and language modalities. Plex greatly improves the state-of-the-art across tasks, and as a pretrained model Plex unifies the traditional protocol of designing and tuning one model for each reliability task. We demonstrate scaling effects over model sizes and pretraining dataset sizes up to 4 billion examples. We also demonstrate Plex’s capabilities on new tasks including zero-shot open set recognition, few-shot uncertainty, and uncertainty in conversational language understanding.
View details
Preview abstract
The concern of overconfident mis-predictions under distributional shift demands extensive reliability research on Graph Neural Networks used in critical tasks in drug discovery. Here we first introduce CardioTox, a real-world benchmark on drug cardio-toxicity to facilitate such efforts. Our exploratory study shows overconfident mis-predictions are often distant from training data. That leads us to develop distance-aware GNNs: GNN-SNGP. Through evaluation on CardioTox and three established benchmarks, we demonstrate GNN-SNGP's effectiveness in increasing distance-awareness, reducing overconfident mis-predictions and making better calibrated predictions without sacrificing accuracy performance. Our ablation study further reveals the representation learned by GNN-SNGP improves distance-preservation over its base architecture and is one major factor for improvements.
View details
Preview abstract
DNA-Encoded Libraries (DEL) data, often with millions of data points, enables large deep learning models to make real contributions in drug discovery (e.g., hit-finding). The state-of-the-art method of modeling DEL data, GCNN multiclass model, requires domain experts to create mutually exclusive classification labels from multiple selection readouts of DEL data, which is not always an optimal formulation. In this work, we designed a GCNN multilabel architecture that directly models each selection data to eliminate dependency on human expertise. We selected effective choices for key modeling components such as label reduction scheme from in silico evaluation. To assess its performance in real-world drug discovery settings, we further carried out prospective wet-lab testing where the multilabel model shows consistent improvement in hit-rate (percentage of hits in a proposed molecule list) over the state-of-the-art multiclass model.
View details