Jeffrey Dean

Jeffrey Dean

I joined Google in mid-1999, and I'm currently Google's Chief Scientist, focusing on AI advances for Google DeepMind and Google Research. My areas of focus include machine learning and AI and applications of AI to problems that help billions of people in societally beneficial ways. I have a broad variety of interests, including machine learning, large-scale distributed systems, computer systems performance, compression techniques, information retrieval, application of machine learning to search and other related problems, microprocessor architecture, compiler optimizations, and the development of new products that organize information in new and interesting ways. My Google Scholar page has a complete list of research papers I have co-authored.

In 2011, I co-founded the Google Brain project/team, focused on making progress towards intelligent machines. Since then, my individual work has focused on research, systems and applications for AI and ML, as well as steering the direction of our broader AI/ML and computer science research community. For the past few years, I’ve had the great pleasure to write a blog post early each year summarizing many pieces of the public work done by amazing colleagues and researchers over the previous year in our research teams (despite the similar-sounding titles, these annual blog posts are each quite different!).

Some of the areas I’ve worked on in AI and ML (generally with many collaborators!) include:
  • Research leadership. Steering the research directions of the Google Brain team, Google Research, and now Google DeepMind (with many others!). See year-end blog post links above for more details about this, which includes advances in things like the Transformer architecture, machine learning systems (DistBelief, TensorFlow, Pathways), TPUs, the Inception model, word2vec, seq2seq models, neural machine translation, distillation, neural architecture search/AutoML, RankBrain, BERT, TensorFlow, JAX, Pathways, PaLM, PaLM 2, PaLI, PaLM-E, MedPalm, NeRF, quantum computing advances, ML for chip design, computational photography (e.g. Night Sight & Magic Eraser), flood forecasting, Responsible AI research areas like bias, fairness and interpretability, medical diagnostics, auction theory, open source software and datasets, accessibility, weather forecasting, ML for robotics, connectomics, genomics, and more, as well as research impact in products across nearly all of Google, including Search, Ads, YouTube, GMail, Workspace, Maps, News, Photos, Translate, Android, Cloud, Pixel, Waymo, and many more products.

  • Computer systems for ML. The design and implementation of three generations of systems for training and deploying of deep learning models: DistBelief, TensorFlow, and Pathways.

    In DistBelief, we explored large-scale, highly distributed systems and asynchronous training algorithms to enable ML models to be trained on large amounts of data, even on the relatively slow, non-ML-optimized hardware of the time (we trained models with 2B non-embedding parameters at a time when the largest models reported in the literature were 10M to 50M parameters). The system was used for hundreds of projects within Google and had widespread use across many Google products. Some of the earliest research work we did using DistBelief was exploring unsupervised learning on video frames to see what sorts of representations would emerge, in Building high-level features using large scale unsupervised learning, a.k.a "the cat neuron paper". We also used DistBelief to develop word2vec, various speech recognition models, multimodal work like DeViSE, and early embedding models like RankBrain.

    TensorFlow: I was one of the primary designers and implementors of the initial TensorFlow system. I made the case that we should open-source Tensorflow, and we released it as an open source project in 2015, hosted on GitHub. It is used by millions of researchers and developers all over the world for exploring and creating ML and AI systems on platforms ranging from tiny embedded systems, to phones, desktop computers, and ML supercomputers. For detailed papers on TensorFlow, see Tensorflow: Large-scale machine learning on heterogeneous distributed systems (white paper) and TensorFlow: A System for Large-Scale Machine Learning (OSDI 2016).

    Pathways is designed to support large-scale, multimodal, sparse architectures that are capable of solving thousands or millions of tasks. I was one of the original designers and implementers, and a paper about the systems research aspects of Pathways appeared in MLSys 2022 as Pathways: Asynchronous Distributed Dataflow for ML. The underlying system software has been used for work like the PaLM language models (which underlie work like a href="https://sites.research.google/med-palm/">Med-PaLM, PaLM-E for robotics), PaLI, and other downstream uses.

  • Language modeling. I have worked on many different projects related to language modeling, starting with work in 2007 that trained 300 billion parameter language models on trillions of tokens of text (Large language models in machine translation), demonstrating significant improvements in translation quality.

    I was a co-author on a pair of papers that introduced an approach of learning distributed representations of words that is now commonly called word2vec (Efficient estimation of word representations in vector space and Distributed representations of words and phrases and their compositionality).

    I was one of many who helped to convert the Google Translate system over to using a neural machine translation system, with further significant gains to translation quality. See Google’s neural machine translation system: Bridging the gap between human and machine translation (2016) and Google’s multilingual neural machine translation system: Enabling zero-shot translation. Gideon Lewis-Kraus of The NY Times magazine wrote an in-depth feature about the rollout of the neural machine translation system in Google Translate in The Great AI Awakening.

    Part of the infrastructure work on Pathways is designed to enable scaling training of larger models on larger and more diverse datasets. I worked on the PaLM language model work, and I am one of the co-leads of the Gemini effort, which is building next-generation multimodal models that can use tools and APIs to enable more capable models that can be used in a variety of Google products and application areas.

  • Distillation. I am one of the co-creators of a machine learning technique called distillation, a now-widely-used approach for transferring the knowledge from one neural network to another. It is often used to create smaller, much more efficient models for inference from larger, more unwieldy models, and it can also be used to transfer knowledge from one neural network architecture to a completely different architecture. See Distilling the Knowledge in a Neural Network.

  • Sparse models. I have been involved in a series of work on sparse model architectures for neural networks, including Outrageously large neural networks: The sparsely-gated mixture-of-experts layer (2017) and Designing Effective Sparse Expert Models. A review of approaches for sparse models appears in A Review of Sparse Expert Models in Deep Learning.

  • AI for ASIC chip design. I have worked on research on how to apply reinforcement learning to the problem of placement and routing in ASIC chip design. We have shown that it is possible to get performance that is as good or better than human performance on the problem of chip floorplanning in a system that runs in a few hours. Our work here was published in Nature and has been used for multiple generations of Google’s TPU ML accelerators.

  • ML for healthcare. I have worked on the use of AI and machine learning in healthcare settings. We have done work showing that machine learning on deidentified medical records can produce useful and actionable suggestions for clinicians, published as Scalable and Accurate Deep Learning with Electronic Health Records. The broader research community at Google has also done work on applying machine learning across many different problems in health, including medical imaging diagnostics, genomics, medical note transcription and summarization, and novel sensing (see health sections of year-in-review blog posts above). I’ve also collaborated on a couple of review articles in this space. One assessed some of the most promising directions for integrating deep learning into healthcare settings, and was published in Nature Medicine as A Guide to Deep Learning in Healthcare. The other was a NEJM article titled Machine Learning in Medicine.

  • ML for computer systems. I have worked with many others on advancing the use of machine learning for tackling computer systems problems. Among these are device placement using reinforcement learning to map abstract ML computation graphs onto a set of physical devices in order to give the best performance (and some follow-on work on a hierarchical version of this), and the use of learned index structures in database systems instead of traditional data structures like B-trees and hash tables.

  • Energy efficiency of machine learning. I have helped push forward Google’s TPU efforts, identifying fairly early in the widespread use of deep learning that creating efficient systems was going to require building customized accelerator hardware, leading to a long line of TPU processors. TPUv1 (In-datacenter Performance Analysis of a Tensor Processing Unit) targeted inference computations and was about 30X - 80X better performance/Watt than contemporary CPUs and GPUs. Subsequent TPU generations target both training and inference in large-scale ML accelerator systems and are crucial to much of the machine learning research and product applications of ML at Google. They are available to external entities as Google Cloud TPUs.

    Carbon emissions of machine learning training is an area that is rife with misinformation due to the prevalence of flawed and inaccurate estimates, so I have also worked with others to correct some of this misinformation and put actual measured data into the literature. See Carbon emissions and large neural network training, especially appendices C and D, and The carbon footprint of machine learning training will plateau, then shrink (if ML researchers adopt best practices). I gave a talk on some of these issues at the 2022 MIT Climate Impacts of Computing and Communications workshop.

While at Google, I've also worked on the following:
  • Google Search. The design and implementation of five generations of our crawling, indexing, and query serving systems, covering two and three orders of magnitude growth in number of documents searched, number of queries handled per second, and frequency of updates to the system. We did not publish research papers on most aspects of this, but I gave a talk at WSDM'09 about some of the issues involved in building large-scale retrieval systems (slides).
  • Search ranking algorithms. Some aspects of our search ranking algorithms, notably improved handling for dealing with off-page signals such as anchortext.
  • Search ranking prototyping system. The design and implementation of prototyping infrastructure for rapid development and experimentation with new ranking algorithms.
  • MapReduce. The design and implementation of MapReduce, a system for simplifying the development of large-scale data processing applications. A paper about MapReduce appeared in OSDI'04. MapReduce is used extensively within Google, and provided the inspiration for external open-source projects like Hadoop, as well as follow-on projects like Flume.

  • BigTable. The design and implementation of BigTable, a large-scale semi-structured storage system used underneath a number of Google products. A paper about BigTable appeared in OSDI'06. BigTable is used by hundreds of teams at Google and sits underneath dozens of products. It is available externally as Cloud Bigtable. As of 2023, BigTable processes more than 6 billion requests per second at peak and has over 10 exabytes of data under management.

  • Spanner. The design and implementation of Spanner, a geographically-distributed worldwide storage system that can provide strong consistency guarantees through the use of Paxos and highly synchronized clocks in multiple data centers. A paper about Spanner appeared in OSDI’12. Spanner is used extensively for hundreds of projects within Google, underlies a large fraction of our products, and is available for external uses as Google’s Cloud Spanner product.

  • Google Ads. I was part of a group of three people who did the design and implementation of the initial version of Google's advertising serving system.
  • AdSense. The initial development of Google's AdSense for Content product (involving both the production serving system design and implementation as well as work on developing and improving the quality of ad selection based on the contents of pages).
  • Protocol buffers. The development of Protocol Buffers, a way of encoding structured data in an efficient yet extensible format, and a compiler that generates convenient wrappers for manipulating the objects in a variety of languages. Protocol Buffers are used extensively at Google for almost all RPC protocols, and for storing structured information in a variety of persistent storage systems. A version of the protocol buffer implementation has been open-sourced and is available at https://github.com/protocolbuffers/protobuf/, and a developer site with documentation and more details is at https://protobuf.dev/.
  • Google News. Some of the initial production serving system work for the Google News product, working with Krishna Bharat to move the prototype system he put together into a deployed system.

  • Job scheduling system. The design and implementation of the first generation of our automated job scheduling system for managing a cluster of machines.
  • Timeseries analysis system. The initial design and implementation of a system for analyzing complex timeseries data. This system is used extensively by dozens of Google teams to support various use cases like suggested completions, recommendations, etc. The system is available for Cloud customers to analyze their own datasets via the Timeseries Insights API.

  • Google Translate. Some of the production system design for Google Translate, our statistical machine translation system. In particular, I designed and implemented a system for distributed high-speed access to very large language models (too large to fit in memory on a single machine), and then later helped with the transition to using neural machine translation models.
  • LevelDB. The design and implementation of LevelDB, a high performance key-value store that we released as an open-source project. It is used in a wide variety of projects including Google Chrome.

  • Code search. Some internal tools to make it easy to rapidly search our internal source code repository. Many of the ideas from this internal tool were incorporated into our Google Code Search product, including the ability to use regular expressions for searching large corpora of source code.
I enjoy developing software with great colleagues, and I've been fortunate to have worked with many wonderful and talented people on all of my work here at Google. To help ensure that Google continues to hire people with excellent technical skills, I've also been fairly involved in our engineering hiring process.

I received a Ph.D. in computer science from the University of Washington in 1996, working on compiler optimizations for object-oriented languages advised by Craig Chambers. I received a B.S. in computer science and economics (summa cum laude) from the University of Minnesota in 1990 (doing honors theses on parallel training of neural networks and the economic impact of HIV/AIDS).

From 1996 to 1999, I worked for Digital Equipment Corporation's Western Research Lab in Palo Alto, where I worked on low-overhead profiling tools, design of profiling hardware for out-of-order microprocessors, and web-based information retrieval. From 1990 to 1991, I worked for the World Health Organization's Global Programme on AIDS, developing software to do statistical modeling, forecasting, and analysis of the HIV pandemic. In high school and during the summers in college, I worked first at the Centers for Disease Control and later at the World Health Organization developing a series of versions of software called Epi Info (wikipedia) for analyzing epidemiological data (still one of my most cited works).

In 2009, I was elected to the National Academy of Engineering, and in 2016, I was elected as a member of the American Academy of Arts and Sciences. I was also named a Fellow of the Association for Computing Machinery (ACM) and a Fellow of the American Association for the Advancement of Sciences (AAAS). I am a recipient of the ACM Prize in Computing (2012, with my long-time colleague Sanjay Ghemawat), the IEEE John von Neumann medal (video), and the Mark Weiser Award.

James Somers of the New Yorker wrote a delightful article in 2018 about me and my long-time collaborator Sanjay Ghemawat and how we work together: The Friendship That Made Google Huge.

Selected slides/talks:

Note that talks with similar titles sometimes end up having different mixes of content.

Some of the papers I’ve co-authored with awesome colleagues have been fortunate enough to win various awards:
  • NeurIPS 2023 Test of Time Award (for Distributed Representations of Words and Phrases and their Compositionality published at NeurIPS 2013)
  • Outstanding Paper Award, MLSys 2022 (for Pathways: Asynchronous Distributed Dataflow for ML)
  • SIGOPS Hall of Fame Award, 2022 (for Spanner: Google’s Globally Distributed Database System at OSDI 2012)
  • Best Paper Award, EuroSys 2018 (for Dynamic Control Flow in Large-Scale Machine Learning)
  • SIGOPS Hall of Fame Award, 2016 (for Bigtable: A Distributed Storage System for Structured Data)
  • SIGOPS Hall of Fame Award, 2015 (for MapReduce: Simplified Data Processing on Large Clusters)
  • Best Paper Award, OSDI 2012 (for Spanner: Google’s Globally Distributed Database System)
  • 10-year Retrospective Most Influential Paper Award from OOPSLA 2007 (for Call Graph Construction in Object-Oriented Languages, 1997).
  • Best Paper Award, OSDI 2006 (for Bigtable: A Distributed Storage System for Structured Data)
  • 10-year Retrospective Most Influential Paper Award from PLDI 2005 (for Selective Specialization for Object-Oriented Languages, 1995)
  • Best Paper Award, SOSP 1997 (for Continuous Profiling: Where Have All the Cycles Gone?)

Personal:

I've lived in lots of places in my life: Honolulu, HI; Manila, The Phillipines; Boston, MA; West Nile District, Uganda; Boston (again); Little Rock, AR; Hawaii (again); Minneapolis, MN; Mogadishu, Somalia; Atlanta, GA; Minneapolis (again); Geneva, Switzerland; Seattle, WA; and (currently) Palo Alto, CA. I'm hard-pressed to pick a favorite, though: each place has its plusses and minuses.

One of my life goals is to play soccer and basketball on every continent. So far, I've done so in North America, South America, Europe, Asia, Oceania, and Africa. I'm worried that Antarctica might be tough, though.

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Many recent papers highlight the importance of thinking about carbon emissions (CO2e) in machine learning (ML) workloads. While elevating the discussion, some early work was also based on incomplete information. (Unfortunately, the most widely cited quantitative estimate that was the basis for many of these papers was off by 88X.) Inspired by these concerns, we looked for approaches that would make ML training considerably less carbon intensive. We identified four best practices that dramatically reduce carbon emissions, and demonstrate two concrete examples of reducing CO2e by 650X over four years and 40X over one year by following them. Provided ML stakeholders follow best practices, we predict that the field will bend the curve of carbon footprint increases from ML training runs to first flatten and then reduce it by 2030 without sacrificing the current rate of rapid advances in ML, contrary to prior dire warnings that ML CO2e will soar. View details
    PaLM: Scaling Language Modeling with Pathways
    Aakanksha Chowdhery
    Sharan Narang
    Jacob Devlin
    Maarten Bosma
    Hyung Won Chung
    Sebastian Gehrmann
    Parker Schuh
    Sasha Tsvyashchenko
    Abhishek Rao
    Yi Tay
    Noam Shazeer
    Nan Du
    Reiner Pope
    James Bradbury
    Guy Gur-Ari
    Toju Duke
    Henryk Michalewski
    Xavier Garcia
    Liam Fedus
    David Luan
    Barret Zoph
    Ryan Sepassi
    David Dohan
    Shivani Agrawal
    Mark Omernick
    Marie Pellat
    Aitor Lewkowycz
    Erica Moreira
    Rewon Child
    Oleksandr Polozov
    Zongwei Zhou
    Brennan Saeta
    Michele Catasta
    Jason Wei
    arxiv:2204.02311(2022)
    Preview abstract Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM. We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods. We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks. On a number of these tasks, PaLM 540B achieves breakthrough performance, outperforming the finetuned state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark. A significant number of BIG-bench tasks showed discontinuous improvements from model scale, meaning that performance steeply increased as we scaled to our largest model. PaLM also has strong capabilities in multilingual tasks and source code generation, which we demonstrate on a wide array of benchmarks. We additionally provide a comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale. Finally, we discuss the ethical considerations related to large language models and discuss potential mitigation strategies. View details
    Preview abstract We present the design of a new large scale orchestration layer for accelerators. Our system, Pathways, is explicitly designed to enable exploration of new systems and ML research ideas, while retaining state of the art performance for current models. Pathways uses a sharded dataflow graph of asynchronous operators that consume and produce futures, and efficiently gang-schedules heterogeneous parallel computations on thousands of accelerators while coordinating data transfers over their dedicated interconnects. Pathways makes use of a novel asynchronous distributed dataflow design that lets the control plane execute in parallel despite dependencies in the data plane. This design, with careful engineering, allows Pathways to adopt a single-controller model that makes it easier to express complex new parallelism patterns. We demonstrate that Pathways can achieve performance parity (~100% accelerator utilization) with state-of-the-art systems when running SPMD computations over 2048 TPUs, while also delivering throughput comparable to the SPMD case for Transformer models that are pipelined across 16 stages, or sharded across two islands of accelerators connected over a data center network. View details
    Emergent abilities of large language models
    Barret Zoph
    Colin Raffel
    Dani Yogatama
    Jason Wei
    Liam B. Fedus
    Maarten Paul Bosma
    Percy Liang
    Sebastian Borgeaud
    Tatsunori B. Hashimoto
    Yi Tay
    TMLR(2022)
    Preview abstract Scaling up language models has been shown to predictably confer a range of benefits such as improved performance and sample efficiency. This paper discusses an unpredictable phenomenon that we call emergent abilities of large language models. Such emergent abilities have close to random performance until evaluated on a model of sufficiently large scale, and hence their emergence cannot be predicted by extrapolating a scaling law based on small-scale models. The emergence of such abilities suggests that additional scaling could further expand the range of tasks that language models can perform. We discuss the implications of these phenomena and suggest directions for future research. View details
    Deep learning-enabled medical computer vision
    Andre Esteva
    Kat Chou
    Serena Yeung
    Nikhil Naik
    Ali Madani
    Ali Mottaghi
    Eric Topol
    Richard Socher
    npj Digital Medicine(2021)
    Preview abstract A decade of unprecedented progress in artificial intelligence (AI) has demonstrated the potential for many fields--including medicine--to benefit from the insights that AI techniques can extract from data. Here we survey recent progress in the development of modern computer vision techniques--powered by deep learning--for medical applications, focusing on medical imaging, medical video, and clinical deployment. We start by briefly summarizing a decade of progress in convolutional neural networks, including the vision tasks they enable, in the context of healthcare. Next, we discuss several example medical imaging applications that stand to benefit--including cardiology, pathology, dermatology, ophthalmology--and propose new avenues for continued work. We then expand into general medical video, highlighting ways in which clinical workflows can integrate computer vision to enhance care. Finally, we discuss the challenges and hurdles required for real-world clinical deployment of these technologies. View details
    Customization Scenarios for De-identification of Clinical Notes
    Danny Vainstein
    Gavin Edward Bee
    Jack Po
    Jutta Williams
    Kat Chou
    Ronit Yael Slyper
    Rony Amira
    Shlomo Hoory
    Tzvika Hartman
    BMC Medical Informatics and Decision Making(2020)
    Preview abstract Background: Automated machine-learning systems are able to de-identify electronic medical records, including free-text clinical notes. Use of such systems would greatly boost the amount of data available to researchers, yet their deployment has been limited due to uncertainty about their performance when applied to new datasets. Objective: We present practical options for clinical note de-identification, assessing performance of machine learning systems ranging from off-the-shelf to fully customized. Methods: We implement a state-of-the-art machine learning de-identification system, training and testing on pairs of datasets that match the deployment scenarios. We use clinical notes from two i2b2 competition corpora, the Physionet Gold Standard corpus, and parts of the MIMIC-III dataset. Results: Fully customized systems remove 97-99% of personally identifying information. Performance of off-the-shelf systems varies by dataset, with performance mostly above 90%. Providing a small labeled dataset or large unlabeled dataset allows for fine-tuning that improves performance over off-the-shelf systems. Conclusion: Health organizations should be aware of the levels of customization available when selecting a de-identification deployment solution, in order to choose the one that best matches their resources and target performance level. View details
    Preview abstract Introduction: Auto-charting -- creation structured sections of clinical notes generated directly from a patient-doctor encounter -- holds promise to lift documentation burden from physicians. However, clinicians exercise professional judgement in what and how to document, and it is unknown if a machine learning (ML) model could assist with these tasks. Objective: Build a ML model to extract symptoms and status (i.e. experienced, not-experienced, not relevant for note) from transcripts of patient-doctor encounters and assess performance on common symptoms and conversations in which a human interpreterscribe is not used. Methods: We generated a ML model to auto-generate a review of systems (ROS) from transcripts of 90,000 de-identified medical encounters. 2950 transcripts were labeled by medical scribes to identify 171 common symptoms. Model accuracy was stratified by how clearly a symptom was mentioned in conversation for 800 snippets, which was assessed by a formal rating system termed conversational clarity. The model was also qualitatively assessed in a variety of conversational motifs. Results: Overall, the model had a sensitivity of 0.71 of matching the exact symptom labeled by a human with a positive predictive value of 0.69. Model sensitivity was associated with the clarity of a conversational (p<0.0001). 39.5% (316/800) snippets of common symptoms contained symptoms mentioned with high clarity, and in this group, the sensitivity of the model was 0.91. The model was robust to a variety of conversational motifs (e.g. detecting symptoms mentioned in colloquial ways). Conclusions: Auto-generating a review of systems is feasible across a wide-range symptoms that are commonly discussed in doctor-patient encounter View details
    An Augmented Reality Microscope with Real-time Artificial Intelligence Integration for Cancer Diagnosis
    Cameron Chen
    Krishna Kumar Gadepalli
    Bob MacDonald
    Shiro Kadowaki
    Kunal Nagpal
    Timo Kohlberger
    Jason Hipp
    Craig Mermel
    Martin Stumpe
    Nature Medicine(2019)
    Preview abstract The microscopic assessment of tissue samples is instrumental for the diagnosis and staging of cancer and thus guides therapy. However, these assessments demonstrate significant variability, and many regions of the world lack access to trained pathologists. Though Artificial Intelligence (AI) promises to improve the access and quality of healthcare, the costs of image digitization in pathology and difficulties in deploying AI solutions remain as barriers to real-world use. Here we propose a cost-effective solution: the Augmented Reality Microscope (ARM). The ARM overlays AI-based information onto the current view of the sample in real-time, enabling seamless integration of AI into routine workflows. We demonstrate the utility of ARM in the detection of metastatic breast cancer and the identification of prostate cancer with latency compatible with real-time use. We anticipate that the ARM will remove barriers towards the use of AI designed to improve the accuracy and efficiency of cancer diagnosis. View details
    Machine Learning for Medicine
    Alvin Rishi Rajkomar
    Isaac Kohane
    New England Journal of Medicine(2019)
    Preview
    Dynamic Control Flow in Large-Scale Machine Learning
    Yuan Yu
    Eugene Brevdo
    Mike Burrows
    Tim Harley
    Peter Hawkins
    Manjunath Kudlur
    Rajat Monga
    Xiaoqiang Zheng
    Proceedings of EuroSys 2018
    Preview abstract Many recent machine learning models rely on fine-grained dynamic control flow for training and inference. In particular, models based on recurrent neural networks and on reinforcement learning depend on recurrence relations, data-dependent conditional execution, and other features that call for dynamic control flow. These applications benefit from the ability to make rapid control-flow decisions across a set of computing devices in a distributed system. For performance, scalability, and expressiveness, a machine learning system must support dynamic control flow in distributed and heterogeneous environments. This paper presents a programming model for distributed machine learning that supports dynamic control flow. We describe the design of the programming model, and its implementation in TensorFlow, a distributed machine learning system. Our approach extends the use of dataflow graphs to represent machine learning models, offering several distinctive features. First, the branches of conditionals and bodies of loops can be partitioned across many machines to run on a set of heterogeneous devices, including CPUs, GPUs, and custom ASICs. Second, programs written in our model support automatic differentiation and distributed gradient computations, which are necessary for training machine learning models that use control flow. Third, our choice of non-strict semantics enables multiple loop iterations to execute in parallel across machines, and to overlap compute and I/O operations. We have done our work in the context of TensorFlow, and it has been used extensively in research and production. We evaluate it using several real-world applications, and demonstrate its performance and scalability. View details