Yann N. Dauphin
Yann N. Dauphin is a machine learning researcher at Google Research working on understanding the fundamentals of deep learning algorithms and leveraging that in applications. Prior to joining Google in 2019, he was a researcher at Facebook AI Research from 2015 to 2018 where his work led to award-winning scientific publications and helped improve automatic translation on Facebook.com. He received his PhD from U. of Montreal under the supervision of Prof. Yoshua Bengio. During this time, he and his team won international machine learning competitions such as the Unsupervised Transfer Learning Challenge in 2011.
Authored Publications
Sort By
Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win
Yani Ioannou
Cem Keskin
AAAI Conference on Artificial Intelligence (2022)
Preview abstract
Sparse Neural Networks (NNs) can match the generalization of dense NNs using a fraction of the compute/storage for inference, and also have the potential to enable efficient training. However, naively training unstructured sparse NNs from random initialization results in significantly worse generalization, with the notable exception of Lottery Tickets (LTs) and Dynamic Sparse Training (DST). In this work, we attempt to answer: (1) why training unstructured sparse networks from random initialization performs poorly and; (2) what makes LTs and DST the exceptions? We show that sparse NNs have poor gradient flow at initialization and propose a modified initialization for unstructured connectivity. Furthermore, we find that DST methods significantly improve gradient flow during training over traditional sparse training methods. Finally, we show that LTs do not improve gradient flow, rather their success lies in re-learning the pruning solution they are derived from - however, this comes at the cost of learning novel solutions.
View details
Continental-scale building detection from high resolution satellite imagery
Wojciech Sirko
Yasser Salah Eddine Bouchareb
Maxim Neumann
Moustapha Cisse
arXiv (2021)
Preview abstract
Identifying the locations and footprints of buildings is vital for many practical and scientific purposes, and such information can be particularly useful in developing regions where alternative data sources may be scarce. In this work, we describe a model training pipeline for detecting buildings across the entire continent of Africa, given 50cm satellite imagery. Starting with the U-Net model, widely used in satellite image analysis, we study variations in architecture, loss functions, regularization, pre-training, self-training and post-processing that increase instance segmentation performance. Experiments were carried out using a dataset of 100k satellite images across Africa containing 1.75M manually labelled building instances, and further datasets for pre-training and self-training. We report novel methods for improving performance of building detection with this type of model, including the use of mixup (mAP +0.12) and self-training with soft KL loss (mAP +0.06). The resulting pipeline obtains good results even on a wide variety of challenging rural and urban contexts, and was used to create the Open Buildings dataset of approximately 600M Africa-wide building footprints.
View details
Preview abstract
Deep neural network pruning and quantization techniques have demonstrated it is possible to achieve high levels of compression with surprisingly little degradation to test set accuracy. However, this measure of performance conceals significant differences in how different classes and images are impacted by model compression techniques. We find that models with radically different numbers of weights have comparable top-line performance metrics but diverge considerably in behavior on a narrow subset of the dataset. This small subset of data points, which we term Pruning Identified Exemplars (PIEs) are systematically more impacted by the introduction of sparsity. Compression disproportionately impacts model performance on the underrepresented long-tail of the data distribution. PIEs over-index on atypical or noisy images that are far more challenging for both humans and algorithms to classify. Our work provides intuition into the role of capacity in deep neural networks and the trade-offs incurred by compression. An understanding of this disparate impact is critical given the widespread deployment of compressed models in the wild.
View details
MetaInit: Initializing learning by learning to initialize
Samuel S. Schoenholz
Advances in Neural Information Processing Systems 32, Curran Associates, Inc. (2019), pp. 12624-12636
Preview abstract
Deep learning models frequently trade handcrafted features for deep features
learned with much less human intervention using gradient descent. While this
paradigm has been enormously successful, deep networks are often difficult to
train and performance can depend crucially on the initial choice of parameters. In
this work, we introduce an algorithm called MetaInit as a step towards automating
the search for good initializations using meta-learning. Our approach is based on
a hypothesis that good initializations make gradient descent easier by starting in
regions that look locally linear with minimal second order effects. We formalize
this notion via a quantity that we call the gradient quotient, which can be computed
with any architecture or dataset. MetaInit minimizes this quantity efficiently
by using gradient descent to tune the norms of the initial weight matrices. We
conduct experiments on plain and residual networks and show that the algorithm
can automatically recover from a class of bad initializations. MetaInit allows us
to train networks and achieve performance competitive with the state-of-the-art
without batch normalization or residual connections. In particular, we find that
this approach outperforms normalization for networks without skip connections on
CIFAR-10 and can scale to Resnet-50 models on Imagenet.
View details
Preview abstract
Neural network pruning techniques have demonstrated it is possible to remove the majority of weights in a network with surprisingly little degradation to test set accuracy. However, this measure of performance conceals significant differences in how different classes and images are impacted by pruning. We find that certain examples, which we term pruning identified exemplars (PIEs), and classes are systematically more impacted by the introduction of sparsity. Removing PIE images from the test-set greatly improves top-1 accuracy for both pruned and non-pruned models. These hard-to-generalize-to images tend to be mislabelled, of lower image quality, depict multiple objects or require fine-grained classification. These findings shed light on previously unknown trade-offs, and suggest that a high degree of caution should be exercised before pruning is used in sensitive domains.
View details