Daniel D. Johnson

Daniel D. Johnson

Research Areas

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Contrastive learning is a powerful framework for learning self-supervised representations that generalize well to downstream supervised tasks. We show that multiple existing contrastive learning methods can be reinterpeted as learning kernel functions that approximate a fixed positive-pair kernel. We then prove that a simple representation obtained by combining this kernel with PCA provably minimizes the worst-case approximation error of linear predictors, under a straightforward assumption that positive pairs have similar labels. Our analysis is based on a decomposition of the target function in terms of the eigenfunctions of a positive-pair Markov chain, and a surprising equivalence between these eigenfunctions and the output of Kernel PCA. We give generalization bounds for downstream linear prediction using our kernel PCA representation, and show empirically on a set of synthetic tasks that applying kernel PCA to contrastive learning models can indeed approximately recover the Markov chain eigenfunctions, although the accuracy depends on the kernel parameterization as well as on the augmentation strength. View details
    Preview abstract Graph representations of programs are commonly a central element of machine learning for code research. We introduce an open source Python library python_graphs that applies static analysis to construct graph representations of Python programs suitable for training machine learning models. Our library admits the construction of control-flow graphs, data-flow graphs, and composite "program graphs" that combine control-flow, data-flow, syntactic, and lexical information about a program. We present the capabilities and limitations of the library, perform a case-study applying the library to millions of competitive programming submissions, and showcase the library's utility for machine learning research. View details
    Beyond In-Place Corruption: Insertion and Deletion In Denoising Probabilistic Models
    Rianne van den Berg
    ICML Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models (2021)
    Preview abstract Denoising diffusion probabilistic models have shown impressive results for generation of sequences by iteratively corrupting each example and then learning to map corrupted versions back to the original. However, previous work has largely focused on in-place corruption, adding noise to each pixel or token individually while keeping their locations the same. In this work, we consider a broader class of corruption processes and denoising models over sequence data that can insert and delete elements, while still being efficient to train and sample from. We demonstrate that these models outperform standard in-place models on an arithmetic sequence task, and that when trained on the text8 dataset they can be used to fix spelling errors without any fine-tuning. View details
    Structured Denoising Diffusion Models in Discrete State-Spaces
    Jonathan Ho
    Rianne van den Berg
    Advances in Neural Information Processing Systems (2021) (to appear)
    Preview abstract Denoising diffusion probabilistic models (DDPMs) [Ho et al. 2021] have shown impressive results on image and waveform generation in continuous state spaces. Here, we introduce Discrete Denoising Diffusion Probabilistic Models (D3PMs), diffusion-like generative models of discrete data that generalize the multinomial diffusion model of Hoogeboom et al. [2021] by going beyond transition matrices with uniform transition probabilities. This includes discrete transition matrices that mimic Gaussian kernels in continuous space, kernels based on nearest neighbors in embedding space, and kernels that introduce masking. The third allows us to draw a connection between diffusion models and autoregressive and mask-based generative models. We show that the choice of transition kernel is an important design decision that leads to improved results in image and text domains. In addition, we show that expressing transition matrices as matrix exponentials leads to efficient implementations and controllable schedules. We also introduce a new loss function that combines the variational lower bound with an auxiliary cross entropy loss. For text, this model class achieves strong results on character-level text generation and scales to LM1B with a large subword-tokenized vocabulary. On the image dataset CIFAR-10, our models achieve FID and Inception scores rivaling those of the continuous DDPM model. View details
    Getting to the Point: Index Sets and Parallelism-Preserving Autodiff for Pointful Array Programming
    Adam Paszke
    Alexey Radul
    David Duvenaud
    Dimitrios Vytiniotis
    Dougal Maclaurin
    Jonathan Ragan-Kelley
    Matthew Johnson
    Proceedings of the ACM on Programming Languages (PACMPL), 5 (2021), 88:1-88:29
    Preview abstract We present a novel programming language design that attempts to combine the clarity and safety of high-level functional languages with the efficiency and parallelism of low-level numerical languages. We treat arrays as eagerly-memoized functions on typed index sets, allowing abstract function manipulations, such as currying, to work on arrays. In contrast to composing primitive bulk-array operations, we argue for an explicit nested indexing style that mirrors application of functions to arguments. We also introduce a fine-grained typed effects system which affords concise and automatically-parallelized in-place updates. Specifically, an associative accumulation effect allows reverse-mode automatic differentiation of in-place updates in a way that preserves parallelism. Empirically, we benchmark against the Futhark array programming language, and demonstrate that aggressive inlining and type-driven compilation allows array programs to be written in an expressive, "pointful" style with little performance penalty. View details
    Preview abstract Graph-based neural network models are producing strong results in a number of domains, in part because graphs provide flexibility to encode domain knowledge in the form of relational structure (edges) between nodes in the graph. In practice, edges are used both to represent intrinsic structure (e.g., bonds in chemical molecules or abstract syntax trees of programs) and more abstract relations that aid reasoning for a downstream task (e.g., results of relevant program analyses). In this work, we study the problem of learning to derive abstract relations from the intrinsic graph structure. Motivated by their power in program analyses, we consider relations defined by paths on the base graph accepted by a finite-state automaton. We show how to learn these relations end-to-end by relaxing the problem into learning finite-state automata policies on a graph-based POMDP and then training these policies using implicit differentiation. The result is a differentiable Graph Finite-State Automaton (GFSA) layer that adds a new edge type (expressed as a weighted adjacency matrix) to a base graph. We demonstrate that this layer can find shortcuts in grid-world graphs and reproduce simple static analyses on Python programs. Additionally, we combine the GFSA layer with a larger graph-based model trained end-to-end on the variable misuse program understanding task, and find that this model outperforms baseline methods even without providing the hand-engineered semantic edges that those baselines use. View details