David J Fleet
Research Areas
      Authored Publications
    
  
  
  
    
    
  
      
        Sort By
        
        
    
    
        
          
            
              Towards Generalist Biomedical AI
            
          
        
        
          
            
              
                
                  
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
    
    
    
    
    
                      
                        Danny Driess
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Andrew Carroll
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Chuck Lau
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ryutaro Tanno
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ira Ktena
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Basil Mustafa
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Aakanksha Chowdhery
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Simon Kornblith
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Philip Mansfield
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sushant Prakash
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Renee Wong
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sunny Virmani
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sara Mahdavi
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Bradley Green
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ewa Dominowska
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Joelle Barral
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Karan Singhal
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Pete Florence
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            NEJM AI (2024)
          
          
        
        
        
          
              Preview abstract
          
          
              BACKGROUND: Medicine is inherently multimodal, requiring the simultaneous interpretation and integration of insights between many data modalities spanning text, imaging, genomics, and more. Generalist biomedical artificial intelligence systems that flexibly encode, integrate, and interpret these data might better enable impactful applications ranging from scientific discovery to care delivery.
METHODS: To catalyze development of these models, we curated MultiMedBench, a new multimodal biomedical benchmark. MultiMedBench encompasses 14 diverse tasks, such as medical question answering, mammography and dermatology image interpretation, radiology report generation and summarization, and genomic variant calling. We then introduced Med-PaLM Multimodal (Med-PaLM M), our proof of concept for a generalist biomedical AI system that flexibly encodes and interprets biomedical data including clinical language, imaging, and genomics with the same set of model weights. To further probe the capabilities and limitations of Med-PaLM M, we conducted a radiologist evaluation of model-generated (and human) chest x-ray reports.
RESULTS: We observed encouraging performance across model scales. Med-PaLM M reached performance competitive with or exceeding the state of the art on all MultiMedBench tasks, often surpassing specialist models by a wide margin. In a side-by-side ranking on 246 retrospective chest x-rays, clinicians expressed a pairwise preference for Med-PaLM Multimodal reports over those produced by radiologists in up to 40.50% of cases, suggesting potential clinical utility.
CONCLUSIONS: Although considerable work is needed to validate these models in real-world cases and understand if cross-modality generalization is possible, our results represent a milestone toward the development of generalist biomedical artificial intelligence systems. 
              
  
View details
          
        
      
    
        
          
            
              Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Su Wang
                      
                    
                
              
            
              
                
                  
                    
                    
                      
                        Chitwan Saharia
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Shai Noy
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Stefano Pellegrini
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sarah Laszlo
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Mohammad Norouzi
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Peter Anderson
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        William Chan
                      
                    
                  
              
            
          
          
          
          
            CVPR (2023)
          
          
        
        
        
          
              Preview abstract
          
          
              Text-guided image editing can have a transformative impact in supporting creative applications. A key challenge is to generate edits that are faithful to the input text prompt, while consistent with the input image. We present Imagen Editor, a cascaded diffusion model, built by fine-tuning Imagen on text-guided image inpainting. Imagen Editor's edits are faithful to the text prompts, which is accomplished by incorporating object detectors for proposing inpainting masks during training. In addition, text-guided image inpainting captures fine details in the input image by conditioning the cascaded pipeline on the original high resolution image. To improve qualitative and quantitative evaluation, we introduce EditBench, a systematic benchmark for text-guided image inpainting. EditBench evaluates inpainting edits on natural and generated images exploring objects, attributes, and scenes. Through extensive human evaluation on EditBench, we find that object-masking during training leads to across-the-board improvements in text-image alignment -- such that Imagen Editor is preferred over DALL-E 2 and Stable Diffusion -- and, as a cohort, these models are better at object-rendering than text-rendering, and handle material/color/size attributes better than count/shape attributes.
              
  
View details
          
        
      
    
        
          
            
              Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging
            
          
        
        
          
            
              
                
                  
                    
                
              
            
              
                
                  
                    
                    
    
    
    
    
    
                      
                        Laura Anne Culp
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jan Freyberg
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Basil Mustafa
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sebastien Baur
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Simon Kornblith
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ting Chen
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Patricia MacWilliams
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sara Mahdavi
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Megan Zoë Walker
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Aaron Loh
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Cameron Chen
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Scott Mayer McKinney
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jim Winkens
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Zach William Beaver
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Fiona Keleher Ryan
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Justin David Krogue
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Mozziyar Etemadi
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Umesh Telang
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Lily Hao Yi Peng
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Geoffrey Everest Hinton
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Neil Houlsby
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Mohammad Norouzi
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            Nature Biomedical Engineering (2023)
          
          
        
        
        
          
              Preview abstract
          
          
              Machine-learning models for medical tasks can match or surpass the performance of clinical experts. However, in settings differing from those of the training dataset, the performance of a model can deteriorate substantially. Here we report a representation-learning strategy for machine-learning models applied to medical-imaging tasks that mitigates such ‘out of distribution’ performance problem and that improves model robustness and training efficiency. The strategy, which we named REMEDIS (for ‘Robust and Efficient Medical Imaging with Self-supervision’), combines large-scale supervised transfer learning on natural images and intermediate contrastive self-supervised learning on medical images and requires minimal task-specific customization. We show the utility of REMEDIS in a range of diagnostic-imaging tasks covering six imaging domains and 15 test datasets, and by simulating three realistic out-of-distribution scenarios. REMEDIS improved in-distribution diagnostic accuracies up to 11.5% with respect to strong supervised baseline models, and in out-of-distribution settings required only 1–33% of the data for retraining to match the performance of supervised models retrained using all available data. REMEDIS may accelerate the development lifecycle of machine-learning models for medical imaging.
              
  
View details
          
        
      
    
        
          
            
              A Generalist Framework for Panoptic Segmentation of Images and Videos
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Ting Chen
                      
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Geoffrey Hinton
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            International Conference on Computer Vision (ICCV) (2023)
          
          
        
        
        
          
              Preview abstract
          
          
              Panoptic segmentation assigns semantic and instance ID labels to every pixel of an image. As permutations of instance IDs are also valid solutions, the task requires learning of high-dimensional one-to-many mapping. As a result, state-of-the-art approaches use customized architectures and task-specific loss functions. We formulate panoptic segmentation as a discrete data generation problem, without relying on inductive bias of the task. A diffusion model is proposed to model panoptic masks, with a simple architecture and generic loss function. By simply adding past predictions as a conditioning signal, our method is capable of modeling video (in a streaming setting) and thereby learns to track object instances automatically. With extensive experiments, we demonstrate that our simple approach can perform competitively to state-of-the-art specialist methods in similar settings.
              
  
View details
          
        
      
    
        
          
            
              Palette: Image-to-Image Diffusion Models
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Chitwan Saharia
                      
                    
                
              
            
              
                
                  
                    
                    
                      
                        Chris A. Lee
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Huiwen Chang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jonathan Ho
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Mohammad Norouzi
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        William Chan
                      
                    
                  
              
            
          
          
          
          
            SIGGRAPH 2022 (2022)
          
          
        
        
        
          
              Preview abstract
          
          
              This paper develops a unified framework for image-to-image translation based on conditional diffusion models and evaluates this framework on four challenging image-to-image translation tasks, namely colorization, inpainting, uncropping, and JPEG restoration. Our simple implementation of image-to-image diffusion models outperforms strong GAN and regression baselines on all tasks, without task-specific hyper-parameter tuning, architecture customization, or any auxiliary loss or sophisticated new techniques needed. We uncover the impact of an L2 vs. L1 loss in the denoising diffusion objective on sample diversity, and demonstrate the importance of self-attention in the neural architecture through empirical studies. Importantly, we advocate a unified evaluation protocol based on ImageNet, with human evaluation and sample quality scores (FID, Inception Score, Classification Accuracy of a pre-trained ResNet-50, and Perceptual Distance against original images). We expect this standardized evaluation protocol to play a role in advancing image-to-image translation research. Finally, we show that a generalist, multi-task diffusion model performs as well or better than task-specific specialist counterparts. Check out https://diffusion-palette.github.io/ for an overview of the results.
              
  
View details
          
        
      
    
        
          
            
              A Unified Sequence Interface for Vision Tasks
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Ting Chen
                      
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Tsung-Yi Lin
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Geoffrey Hinton
                      
                    
                  
              
            
          
          
          
          
            Advances in Neural Information Processing Systems (NeurIPS) (2022)
          
          
        
        
        
          
              Preview abstract
          
          
              While language tasks are naturally expressed in a single, unified, modeling framework, i.e., generating sequences of tokens, this has not been the case in computer vision. As a result, there is a proliferation of distinct architectures and loss functions for different vision tasks. In this work we show that a diverse set of "core" computer vision tasks can also be unified if formulated in terms of a shared pixel-to-sequence interface. We focus on four tasks, namely, object detection, instance segmentation, keypoint detection, and image captioning, all with diverse types of outputs, e.g., bounding boxes or dense masks. Despite that, by formulating the output of each task as a sequence of discrete tokens with a unified interface, we show that one can train a neural network with a single model architecture and loss function on all these tasks, with no task-specific customization. To solve a specific task, we use a short prompt as task description, and the sequence output adapts to the prompt so it can produce task-specific output. We show that such a model can achieve competitive performance compared to well-established task-specific models.
              
  
View details
          
        
      
    
        
          
            
              Pix2seq: A Language Modeling Framework for Object Detection
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Ting Chen
                      
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Geoffrey Everest Hinton
                      
                    
                  
              
            
          
          
          
          
            International Conference on Learning Representations (2022)
          
          
        
        
        
          
              Preview abstract
          
          
              We present Pix2Seq, a simple and generic framework for object detection. Unlike existing approaches that explicitly integrate prior knowledge about the task, we cast object detection as a language modeling task conditioned on the observed pixel inputs. Object descriptions (e.g., bounding boxes and class labels) are expressed as sequences of discrete tokens, and we train a neural network to perceive the image and generate the desired sequence. Our approach is based mainly on the intuition that if a neural network knows about where and what the objects are, we just need to teach it how to read them out. Beyond the use of task-specific data augmentations, our approach makes minimal assumptions about the task, yet it achieves competitive results on the challenging COCO dataset, compared to highly specialized and well optimized detection algorithms.
              
  
View details
          
        
      
    
        
          
            
              Kubric: A scalable dataset generator
            
          
        
        
          
            
              
                
                  
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
    
    
    
    
    
                      
                        Anissa Yuenming Mak
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Austin Stone
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Carl Doersch
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Cengiz Oztireli
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Charles Herrmann
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Daniel Rebain
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Derek Nowrouzezahrai
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Dmitry Lagun
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Fangcheng Zhong
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Florian Golemo
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Francois Belletti
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Henning Meyer
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Hsueh-Ti (Derek) Liu
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Issam Laradji
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Klaus Greff
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Kwang Moo Yi
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Lucas Beyer
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Matan Sela
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Noha Radwan
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Thomas Kipf
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Tianhao Wu
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Vincent Sitzmann
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Yilun Du
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Yishu Miao
                      
                    
                  
              
            
          
          
          
          
             (2022)
          
          
        
        
        
          
              Preview abstract
          
          
              Data is the driving force of machine learning. The amount and quality of training data is often more important for the performance of a system than the details of its architecture. Data is also an important tool for testing specific hypothesis, and for empirically evaluating the behaviour of complex systems. Synthetic data generation represents a powerful tool that can address all these shortcomings: 1) it is cheap 2) supports rich ground-truth annotations 3) offers full control over data and 4) can circumvent privacy and legal concerns. Unfortunately the toolchain for generating data is less well developed than that for building models. We aim to improve this situation by introducing Kubric: a scalable open-source pipeline for generating realistic image and video data with rich ground truth annotations.
We also publish a collection of generated datasets and baseline results on several vision tasks.
              
  
View details
          
        
      
    
        
          
            
              Image Super-Resolution via Iterative Refinement
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Chitwan Saharia
                      
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jonathan Ho
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Mohammad Norouzi
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        William Chan
                      
                    
                  
              
            
          
          
          
          
            Submission to ICCV 2021
          
          
        
        
        
          
              Preview abstract
          
          
              We present SR3, an approach to image Super-Resolution via Repeated Refinement. SR3 adapts denoising diffusion probabilistic models to conditional image generation and performs super-resolution through a stochastic denoising process. Inference starts with pure Gaussian noise and iteratively refines the noisy output using a U-Net model trained on denoising at various noise levels. SR3 exhibits
strong performance on super-resolution tasks at different magnification factors, on faces and natural images. We conduct human evaluation on a standard 8× face super-resolution task on CelebA-HQ, comparing with SOTA GAN methods. SR3 achieves a fool rate close to 50%, suggesting
photo-realistic outputs, while GAN baselines do not exceed a fool rate of 34%. We further show the effectiveness of SR3 in cascaded image generation, where generative models are chained with super-resolution models, yielding competitive FID scores on ImageNet.
              
  
View details
          
        
      
    
        
          
            
              Cascaded Diffusion Models for High Fidelity Image Generation
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Jonathan Ho
                      
                    
                
              
            
              
                
                  
                    
                    
                      
                        Chitwan Saharia
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        William Chan
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Mohammad Norouzi
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            https://cascaded-diffusion.github.io/ (2021)
          
          
        
        
        
          
              Preview abstract
          
          
              We show that cascaded diffusion models are capable of generating high fidelity images on the class-conditional ImageNet generation challenge, without any assistance from auxiliary image classifiers to boost sample quality. A cascaded diffusion model comprises a pipeline of multiple diffusion models that generate images of increasing resolution, beginning with a standard diffusion model at the lowest resolution, followed by one or more super-resolution diffusion models that successively upsample the image and add higher resolution details. We find that the sample quality of a cascading pipeline relies crucially on conditioning augmentation, our proposed method of data augmentation of the lower resolution conditioning inputs to the super-resolution models. Our experiments show that conditioning augmentation prevents compounding error during sampling in a cascaded model, helping us to train cascading pipelines achieving FID scores of 1.48 at 64x64, 3.52 at 128x128 and 4.88 at 256x256 resolutions, outperforming BigGAN-deep, and classification accuracy scores of 63.02% (top-1) and 84.06% (top-5) at 256x256, outperforming VQ-VAE-2.
              
  
View details