Rif A. Saurous
Authored Publications
Sort By
Sequential Monte Carlo Learning for Time Series Structure Discovery
Feras Saad
Matthew D. Hoffman
Vikash Mansinghka
Proceedings of the 40th International Conference on Machine Learning (2023), pp. 29473-29489
Preview abstract
This paper presents a new approach to automatically discovering accurate
models of complex time series data. Working within a Bayesian nonparametric
prior over a symbolic space of Gaussian process time series models, we
present a novel structure learning algorithm that integrates sequential
Monte Carlo (SMC) and involutive MCMC for highly effective posterior
inference. Our method can be used both in "online'' settings, where new
data is incorporated sequentially in time, and in "offline'' settings, by
using nested subsets of historical data to anneal the posterior. Empirical
measurements on a variety of real-world time series show that our method
can deliver 10x--100x runtime speedups over previous MCMC and greedy-search
structure learning algorithms for the same model family. We use our method
to perform the first large-scale evaluation of Gaussian process time series
structure learning on a widely used benchmark of 1,428 monthly econometric
datasets, showing that our method discovers sensible models that deliver
more accurate point forecasts and interval forecasts over multiple horizons
as compared to prominent statistical and neural baselines that struggle on
this challenging data.
View details
Automatically batching control-intensive programs for modern accelerators
Alexey Radul
Dougal Maclaurin
Matthew D. Hoffman
Third Conference on Systems and Machine Learning, Austin, TX (2020)
Preview abstract
We present a general approach to batching arbitrary computations for
GPU and TPU accelerators. We demonstrate the effectiveness of our
method with orders-of-magnitude speedups on the No U-Turn Sampler
(NUTS), a workhorse algorithm in Bayesian statistics. The central
challenge of batching NUTS and other Markov chain Monte Carlo
algorithms is data-dependent control flow and recursion. We overcome
this by mechanically transforming a single-example implementation into
a form that explicitly tracks the current program point for each batch
member, and only steps forward those in the same place. We present
two different batching algorithms: a simpler, previously published one
that inherits recursion from the host Python, and a more complex,
novel one that implmenents recursion directly and can batch across it.
We implement these batching methods as a general program
transformation on Python source. Both the batching system and the
NUTS implementation presented here are available as part of the
popular TensorFlow Probability software package.
View details
Estimating the Changing Infection Rate of COVID-19 Using Bayesian Models of Mobility
Xue Ben
Shawn O'Banion
Matthew D. Hoffman
medRxiv, https://www.medrxiv.org/content/10.1101/2020.08.06.20169664v1.full (2020)
Preview abstract
In order to prepare for and control the continued spread of the COVID-19 pandemic while minimizing its economic impact, the world needs to be able to estimate and predict COVID-19’s spread.
Unfortunately, we cannot directly observe the prevalence or growth rate of COVID-19; these must be inferred using some kind of model.
We propose a hierarchical Bayesian extension to the classic susceptible-exposed-infected-removed (SEIR) compartmental model that adds compartments to account for isolation and death and allows the infection rate to vary as a function of both mobility data collected from mobile phones and a latent time-varying factor that accounts for changes in behavior not captured by mobility data. Since confirmed-case data is unreliable, we infer the model’s parameters conditioned on deaths data. We replace the exponential-waiting-time assumption of classic compartmental models with Erlang distributions, which allows for a more realistic model of the long lag between exposure and death. The mobility data gives us a leading indicator that can quickly detect changes in the pandemic’s local growth rate and forecast changes in death rates weeks ahead of time. This is an analysis of observational data, so any causal interpretations of the model's inferences should be treated as suggestive at best; nonetheless, the model’s inferred relationship between different kinds of trips and the infection rate do suggest some possible hypotheses about what kinds of activities might contribute most to COVID-19’s spread.
View details
Coincidence, Categorization, and Consolidation: Learning to Recognize Sounds with Minimal Supervision
Proceedings of ICASSP 2020 (2020) (to appear)
Preview abstract
Humans do not acquire perceptual abilities like we train machines. While machine learning algorithms typically operate on large collections of randomly-chosen, explicitly-labeled examples, human acquisition relies far greater on multimodal unsupervised learning (as infants) and active learning (as children). With this motivation, we present a learning framework for sound representation and recognition that combines (i) a self-supervised objective based on a general notion of unimodal and cross-modal coincidence, (ii) a novel clustering objective that reflects our need to impose categorical structure on our experiences, and (iii) a cluster-based active learning procedure that solicits targeted weak supervision to consolidate hypothesized categories into relevant semantic classes. By jointly training a single sound embedding/clustering/classification network according to these criteria, we achieve a new state-of-the-art unsupervised audio representation and demonstrate
up to 20-fold reduction in labels required to reach a desired classification performance.
View details
Large-Scale Weakly-Supervised Content Embeddingsfor Music Recommendation and Tagging
Qingqing Huang
Li Zhang
John Roberts Anderson
ICASSP 2020 (2020)
Preview abstract
We explore content-based representation learning strategies tailored for
large-scale, uncurated music collections that afford only weak supervision
through unstructured natural language metadata and co-listen statistics. At the
core is a hybrid training scheme that uses classification and metric learning
losses to incorporate both metadata-derived text labels and aggregate co-listen
supervisory signals into a single convolutional model. The resulting joint text
and audio content embedding defines a similarity metric and supports prediction
of semantic text labels using a vocabulary of unprecedented granularity, which
we refine using a novel word-sense disambiguation procedure. As input to simple
classifier architectures, our representation achieves state-of-the-art
performance on two music tagging benchmarks.
View details
Differentiable Consistency Constraints for Improved Deep Speech Enhancement
Jeremy Thorpe
Michael Chinen
IEEE International Conference on Acoustics, Speech, and Signal Processing (2019)
Preview abstract
In recent years, deep networks have led to dramatic improvements in speech enhancement by framing it as a data-driven pattern recognition problem. In many modern enhancement systems, large amounts of data are used to train a deep network to estimate masks for complex-valued short-time Fourier transforms (STFTs) to suppress noise and preserve speech. However, current masking approaches often neglect two important constraints: STFT consistency and mixture consistency. Without STFT consistency, the system’s output is not necessarily the STFT of a time-domain signal, and without mixture consistency, the sum of the estimated sources does not necessarily equal the input mixture. Furthermore, the only previous approaches that apply mixture consistency use real-valued masks; mixture consistency has been ignored for complex-valued masks. In this paper, we show that STFT consistency and mixture consistency can be jointly imposed by adding simple differentiable projection layers to the enhancement network. These layers are compatible with real or complex-valued masks. Using both of these constraints with complex-valued masks provides a 0.7 dB increase in scale-invariant signal-to-distortion ratio (SI-SDR) on a large dataset of speech corrupted by a wide variety of nonstationary noise across a range of input SNRs.
View details
Fixing a Broken ELBO
Alex Alemi
Ben Poole
Josh Dillon
Proceedings of the 35th International Conference on Machine Learning, PMLR, Stockholmsmässan, Stockholm Sweden (2018), pp. 159-168
Preview abstract
Recent work in unsupervised representation learning has focused on learning deep directed latent variable models. Fitting these models by maximizing the marginal likelihood or evidence is typically intractable, thus a common approximation is to maximize the evidence lower bound (ELBO) instead. However, maximum likelihood training (whether exact or approximate) does not necessarily result in a good latent representation, as we demonstrate both theoretically and empirically. In particular, we derive variational lower and upper bounds on the mutual information between the input and the latent variable, and use these bounds to derive a rate-distortion curve that characterizes the tradeoff between compression and reconstruction accuracy. Using this framework, we demonstrate that there is a family of models with identical ELBO, but different quantitative and qualitative characteristics. Our framework also suggests a simple new method to ensure that latent variable models with powerful stochastic decoders do not ignore their latent code.
View details
Neumann Optimizer: A Practical Optimizer for Deep Neural Networks
Shankar Krishnan
Ying Xiao
International Conference on Learning Representations (ICLR) (2018)
Preview abstract
Progress in deep learning is slowed by the days or weeks it takes to train large models. The natural solution of using more hardware is limited by diminishing returns, and leads to inefficient use of additional resources. In this paper, we present a large batch, stochastic optimization algorithm that is both faster than widely used algorithms for fixed amounts of computation, and is also able to scale up substantially better as more computational resources become available. Our algorithm implicitly computes the inverse hessian of each mini-batch to produce descent directions. We demonstrate the effectiveness of our algorithm by successfully training large ImageNet models (Inception V3, Resnet-50, Resnet-101 and Inception-Resnet) with mini-batch sizes of up to 32000 with no loss in validation error relative to current baselines, and no increase in the total number of steps. At smaller mini-batch sizes, our optimizer improves the validation error in these models by 0.8-0.9%. Alternatively, we can trade off this accuracy to reduce the number of training steps needed by roughly 10-30%. Our work is practical and easily usable by others -- only one hyperparameter (learning rate) needs tuning, and furthermore, the algorithm is as computationally cheap as the commonly used adam optimizer.
View details
Simple, Distributed, and Accelerated Probabilistic Programming
Matthew D. Hoffman
Dave Moore
Christopher Gordon Suter
Srinivas Vasudevan
Alexey Radul
Matthew Johnson
NeurIPS (2018)
Preview abstract
We describe Edward2, a low-level probabilistic programming language. Edward2 distills the core of probabilistic programming down to a single abstraction—the random variable. By blurring the line between model and computation, Edward2 enables numerous applications not shown before: a model-parallel variational auto-encoder (VAE) with tensor processing units (TPUs); a data-parallel autoregressive model (Image Transformer) with TPUs; and multi-GPU No-U-Turn Sampler (NUTS). Edward2 achieves an optimal linear speedup from 4 to 256 TPUs. With VAEs, Edward2 sees up to a 20x speedup on TPUs over Pyro and Edward on GPUs; with Bayesian neural networks, Edward2 sees up to a 51x speedup. With NUTS, Edward2 sees a 20x speedup on GPUs over Stan and 7x over PyMC3.
View details
Unsupervised Learning of Semantic Audio Representations
Ratheet Pandya
Jiayang Liu
Proceedings of ICASSP 2018 (to appear)
Preview abstract
Even in the absence of any explicit semantic annotation, vast collections of audio recordings provide valuable information for learning the categorical structure of sounds. We consider several class-agnostic semantic constraints that apply to unlabeled nonspeech audio: (i) noise and translations in time do not change the underlying sound category, (ii) a mixture of two sound events inherits the categories of the constituents, and (iii) the categories of events in close temporal proximity are likely to be the same or related. Without labels to ground them, these constraints are incompatible with classification loss functions. However, they may still be leveraged to identify geometric inequalities needed for triplet loss-based training of convolutional neural networks. The result is low-dimensional embeddings of the input spectrograms that recover 41% and 84% of the performance of their fully-supervised counterparts when applied to downstream query-by-example sound retrieval and sound event classification tasks, respectively. Moreover, in limited-supervision settings, our unsupervised embeddings double the state-of-the-art classification performance.
View details