Joseph Antognini
I am a Google AI Resident. Prior to my work at Google I worked in astronomy studying the dynamics of few-body systems. My current research interests are threefold:
1. Applying deep learning to the audio domain. In particular I am interested in the problem of fast spectrogram inversion.
2. Understanding the training dynamics of neural networks.
3. Applying deep learning to problems in astronomy.
Research Areas
Authored Publications
Sort By
Preview abstract
Texture synthesis techniques based on matching the Gram matrix of feature activations in neural networks have achieved spectacular success in the image domain. In this paper we extend these techniques to the audio domain. We demonstrate that synthesizing diverse audio textures is challenging, and argue that this is because audio data is relatively low-dimensional. We therefore introduce two new terms to the original Grammian loss: an autocorrelation term that preserves rhythm, and a diversity term that encourages the optimization procedure to synthesize unique textures. We quantitatively study the impact of our design choices on the quality of the synthesized audio by introducing an audio analogue to the Inception loss which we term the VGGish loss. We show that there is a trade-off between the diversity and quality of the synthesized audio using this technique. Finally we perform a number of experiments to qualitatively study how these design choices impact the quality of the synthesized audio.
View details
Preview abstract
One technique to visualize the training of neural networks is to perform PCA on
the parameters over the course of training and to project to the subspace spanned
by the first few PCA components. In this paper we compare this technique to the
PCA of a high dimensional random walk. We compute the eigenvalues and eigenvectors
of the covariance of the trajectory and prove that in the long trajectory
and high dimensional limit most of the variance is in the first few PCA components,
and that the projection of the trajectory onto any subspace spanned by PCA
components is a Lissajous curve. We generalize these results to a random walk
with momentum and to an Ornstein-Uhlenbeck processes (i.e., a random walk in
a quadratic potential) and show that in high dimensions the walk is not mean reverting,
but will instead be trapped at a fixed distance from the minimum. We
finally compare the distribution of PCA variances and the PCA projected training
trajectories of a linear model trained on CIFAR-10 and ResNet-50-v2 trained on
Imagenet and find that the distribution of PCA variances resembles a random walk
with drift.
View details
Measuring the Effects of Data Parallelism on Neural Network Training
Chris Shallue
Jaehoon Lee
Jascha Sohl-dickstein
Journal of Machine Learning Research (JMLR) (2018)
Preview abstract
Recent hardware developments have made unprecedented amounts of data parallelism available for accelerating neural network training. Among the simplest ways to harness next-generation accelerators is to increase the batch size in standard mini-batch neural network training algorithms. In this work, we aim to experimentally characterize the effects of increasing the batch size on training time, as measured by the number of steps necessary to reach a goal out-of-sample error. Eventually, increasing the batch size will no longer reduce the number of training steps required, but the exact relationship between the batch size and how many training steps are necessary is of critical importance to practitioners, researchers, and hardware designers alike. We study how this relationship varies with the training algorithm, model, and data set and find extremely large variation between workloads. Along the way, we reconcile disagreements in the literature on whether batch size affects model quality. Finally, we discuss the implications of our results for efforts to train neural networks much faster in the future.
View details