Danny Tarlow
I'm a Research Scientist at Google Research, Brain Team in Montreal. I'm primarily interested in machine learning methods for understanding and generating programs. However, I have fairly broad interests across Machine Learning.
On the academic side, I'm also an Adjunct Professor in the School of Computer Science at McGill University and an associate member at MILA. I co-supervise a couple PhD students at MILA and generally spend Friday mornings there. If you're a MILA student and would like to chat on a Friday, send me mail or find me on slack. I hold a Ph.D. from the Machine Learning group at University of Toronto (2013). Before coming to Montreal, I spent four years as a postdoc and then Researcher at Microsoft Research, Cambridge (UK).
Research Areas
Authored Publications
Sort By
Resolving Code Review Comments with Machine Learning
Alexander Frömmgen
Peter Choy
Elena Khrapko
Marcus Revaj
2024 IEEE/ACM 46th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP) (to appear)
Preview abstract
Code reviews are a critical part of the software development process, taking a significant amount of the code authors’ and the code reviewers’ time. As part of this process, the reviewer inspects the proposed code and asks the author for code changes through comments written in natural language. At Google, we see millions of reviewer comments per year, and authors require an average of ∼60 minutes active shepherding time between sending changes for review and finally submitting the change. In our measurements, the required active work time that the code author must devote to address reviewer comments grows almost linearly with the number of comments. However, with machine learning (ML), we have an opportunity to automate and streamline the code-review process, e.g., by proposing code changes based on a comment’s text.
We describe our application of recent advances in large sequence models in a real-world setting to automatically resolve code-review comments in the day-to-day development workflow at Google. We present the evolution of this feature from an asynchronous generation of suggested edits after the reviewer sends feedback, to an interactive experience that suggests code edits to the reviewer at review time. In deployment, code-change authors at Google address 7.5% of all reviewer comments by applying an ML-suggested edit. The impact of this will be to reduce the time spent on code reviews by hundreds of thousands of engineer hours annually at Google scale. Unsolicited, very positive feedback highlights that the impact of ML-suggested code edits increases Googlers’ productivity and allows them to focus on more creative and complex tasks.
View details
AI-assisted Assessment of Coding Practices in Industrial Code Review
Ivan Budiselic
Malgorzata (Gosia) Salawa
Juanjo Carin
Jovan Andonov
Mateusz Lewko
Marko Ivanković
Rene Just
Preview abstract
Modern code review is a process in which incremental code contributions made by one software developer are reviewed by one or more peers before it is committed to the version control system. An important element of modern code review is verifying that the code under review adheres to style guidelines and best practices of the corresponding programming language. Some of these rules are universal and can be checked automatically or enforced via code formatters. Other rules, however, are context-dependent and the corresponding checks are commonly left to developers who are experts in the given programming language and whose time is expensive. Many automated systems have been developed that attempt to detect various rule violations without any human intervention. Historically, such systems implement targeted analyses and were themselves expensive to develop. This paper presents AutoCommenter, a system that uses a state of the art large language model to automatically learn and enforce programming language best practices. We implemented AutoCommenter for four programming languages: C++, Java, Python and Go. We evaluated its performance and adoption in a large industrial setting. Our evaluation shows that a model that automatically learns language best practices is feasible and has a measurable positive impact on the developer workflow. Additionally, we present the challenges we faced when deploying such a model to tens of thousands of developers and provide lessons we learned for any practitioners that would like to replicate the work or build on top of it.
View details
Static Prediction of Runtime Errors by Learning to Execute Programs with External Resource Descriptions
Rishab Goel
International Conference on Learning Representations (ICLR) (2023)
Preview abstract
The execution behavior of a program often depends on external resources, such as program inputs or file contents, and so cannot be run in isolation. Nevertheless, software developers benefit from fast iteration loops where automated tools identify errors as early as possible, even before programs can be compiled and run. This presents an interesting machine learning challenge: can we predict runtime errors in a ``static'' setting, where program execution is not possible? Here, we introduce a real-world dataset and task for predicting runtime errors, which we show is difficult for generic models like Transformers. As an alternative, we develop an interpreter-inspired architecture with an inductive bias towards mimicking program executions, which models exception handling and ``learns to execute'' descriptions of the contents of external resources. Surprisingly, we show that the model can also predict the location of the error, despite being trained only on labels indicating the presence/absence and kind of error. In total, we present a practical and difficult-yet-approachable challenge problem related to learning program execution and we demonstrate promising new capabilities of interpreter-inspired machine learning models for code.
View details
Graph Representations of Python Programs via Source-level Static Analysis
Vincent Josua Hellendoorn
Arxiv (2022)
Preview abstract
Graph representations of programs are commonly a central element of machine learning for code research. We introduce an open source Python library python_graphs that applies static analysis to construct graph representations of Python programs suitable for training machine learning models. Our library admits the construction of control-flow graphs, data-flow graphs, and composite "program graphs" that combine control-flow, data-flow, syntactic, and lexical information about a program. We present the capabilities and limitations of the library, perform a case-study applying the library to millions of competitive programming submissions, and showcase the library's utility for machine learning research.
View details
Preview abstract
The goal of program synthesis from examples is to find a computer program that is consistent with a given set of input-output examples. Most learning-based approaches try to find a program that satisfies all examples at once. Our work, by contrast, considers an approach that breaks the problem into two stages: (a) find programs that satisfy only one example, and (b) leverage these per-example solutions to yield a program that satisfies all examples. We introduce the Cross Aggregator neural network module based on multi-head attention mechanism that learns to combine the cues present in these per-example solutions to synthesize a global solution. Evaluation across programs of different lengths and under two different experimental settings reveal that when given the same budget, our technique significantly improves the success rate over PCCoder [Zohar et. al 2018] and other ablation baselines.
View details
Beyond In-Place Corruption: Insertion and Deletion In Denoising Probabilistic Models
Rianne van den Berg
ICML Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models (2021)
Preview abstract
Denoising diffusion probabilistic models have shown impressive results for generation of sequences by iteratively corrupting each example and then learning to map corrupted versions back to the original. However, previous work has largely focused on in-place corruption, adding noise to each pixel or token individually while keeping their locations the same. In this work, we consider a broader class of corruption processes and denoising models over sequence data that can insert and delete elements, while still being efficient to train and sample from. We demonstrate that these models outperform standard in-place models on an arithmetic sequence task, and that when trained on the text8 dataset they can be used to fix spelling errors without any fine-tuning.
View details
Structured Denoising Diffusion Models in Discrete State-Spaces
Jonathan Ho
Rianne van den Berg
Advances in Neural Information Processing Systems (2021) (to appear)
Preview abstract
Denoising diffusion probabilistic models (DDPMs) [Ho et al. 2021] have shown impressive results on image and waveform generation in continuous state spaces. Here, we introduce Discrete Denoising Diffusion Probabilistic Models (D3PMs), diffusion-like generative models of discrete data that generalize the multinomial diffusion model of Hoogeboom et al. [2021] by going beyond transition matrices with uniform transition probabilities. This includes discrete transition matrices that mimic Gaussian kernels in continuous space, kernels based on nearest neighbors in embedding space, and kernels that introduce masking. The third allows us to draw a connection between diffusion models and autoregressive and mask-based generative models. We show that the choice of transition kernel is an important design decision that leads to improved results in image and text domains. In addition, we show that expressing transition matrices as matrix exponentials leads to efficient implementations and controllable schedules. We also introduce a new loss function that combines the variational lower bound with an auxiliary cross entropy loss. For text, this model class achieves strong results on character-level text generation and scales to LM1B with a large subword-tokenized vocabulary. On the image dataset CIFAR-10, our models achieve FID and Inception scores rivaling those of the continuous DDPM model.
View details
PLUR: A Unifying, Graph-Based View of Program Learning, Understanding, and Repair
Zimin Chen
Vincent J Hellendoorn
Subhodeep Moitra
Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021) (2021)
Preview abstract
Machine learning for understanding and editing source code has recently attracted significant interest, with many developments in new models,
new code representations, and new tasks.
This proliferation can appear disparate and disconnected, making each approach seemingly unique and incompatible, thus obscuring the core
machine learning challenges and contributions.
In this work, we demonstrate that the landscape can be significantly simplified by taking a general approach of mapping a graph to a sequence of tokens and pointers.
Our main result is to show that 16 recently published tasks of different shapes can be cast in this form, based on which a single model architecture achieves near or above state-of-the-art results on nearly all tasks, outperforming custom models like code2seq and alternative generic models like Transformers.
This unification further enables multi-task learning and a series of cross-cutting experiments about the importance of different modeling choices for code understanding and repair tasks.
The full framework, called PLUR, is easily extensible to more tasks, and will be open-sourced (link).
View details
Preview abstract
As the performance of computer systems stagnates due to the end of Moore’s Law, there is a need for new models that can understand and optimize the execution of general purpose code. While there is a growing body of work on using Graph Neural Networks (GNNs) to learn static representations of source code, these representations do not understand how code executes at runtime. In this work, we propose a new approach using GNNs to learn fused representations of general source code and its execution. Our approach defines a multi-task GNN over low-level representations of source code and program state (i.e., assembly code and dynamic memory states), converting complex source code constructs and data structures into a simpler, more uniform format. We show that this leads to improved performance over similar methods that do not use execution and it opens the door to applying GNN models to new tasks that would not be feasible from static code alone. As an illustration of this, we apply the new model to challenging dynamic tasks (branch prediction and prefetching) from the SPEC CPU benchmark suite, outperforming the state-of-the-art by 26% and 45% respectively. Moreover, we use the learned fused graph embeddings to demonstrate transfer learning with high performance on an indirectly related algorithm classification task.
View details
Learning to Execute Programs with Instruction Pointer Attention Graph Neural Networks
Thirty-fourth Conference on Neural Information Processing Systems (2020)
Preview abstract
Graph neural networks (GNNs) have emerged as a powerful tool for learning software engineering tasks including code completion, bug finding, and program repair. They benefit from leveraging program structure like control flow graphs, but they are not well-suited to tasks like program execution that require far more sequential reasoning steps than number of GNN propagation steps. Recurrent neural networks (RNNs), on the other hand, are well-suited to long sequential chains of reasoning, but they do not naturally incorporate program structure and generally perform worse on the above tasks. Our aim is to achieve the best of both worlds, and we do so by introducing a novel GNN architecture, the Instruction Pointer Attention Graph Neural Network (IPA-GNN), which achieves systematic generalization on the task of learning to execute programs using control flow graphs. The model arises by developing a spectrum of models between RNNs operating on program traces with branch decisions as latent variables and GNNs. The IPA-GNN can be seen either as a continuous relaxation of the RNN model or as a GNN variant more tailored to execution. To test the models, we propose evaluating systematic generalization on learning to execute using control flow graphs, which tests sequential reasoning and use of program structure. More practically, we evaluate these models on the task of learning to execute partial programs, as might arise if using the model as a value function in program synthesis. Results show that the IPA-GNN outperforms a variety of RNN and GNN baselines on both tasks.
View details