Hugo Larochelle
I am a Principal Scientist in the Google DeepMind team in Montreal. My main area of expertise is deep learning. My previous work includes unsupervised pretraining with autoencoders, denoising autoencoders, visual attention-based classification, neural autoregressive distribution models and zero-shot learning. More broadly, I’m interested in applications of deep learning to natural language processing, code, computer vision and environmental sustainability problems.
Previously, I was Associate Professor at the Université de Sherbrooke (UdeS). I also co-founded Whetlab, which was acquired in 2015 by Twitter, where I then worked as a Research Scientist in the Twitter Cortex group. From 2009 to 2011, I was also a member of the machine learning group at the University of Toronto, as a postdoctoral fellow under the supervision of Geoffrey Hinton. I obtained my Ph.D. at the Université de Montréal, under the supervision of Yoshua Bengio.
My academic involvement includes being a member of the boards for the International Conference on Machine Learning (ICML) and for the Neural Information Processing Systems (NeurIPS) conference. I also co-founded the journal Transactions on Machine Learning Research.
Finally, I have a popular online course on deep learning and neural networks, freely accessible on YouTube.
Previously, I was Associate Professor at the Université de Sherbrooke (UdeS). I also co-founded Whetlab, which was acquired in 2015 by Twitter, where I then worked as a Research Scientist in the Twitter Cortex group. From 2009 to 2011, I was also a member of the machine learning group at the University of Toronto, as a postdoctoral fellow under the supervision of Geoffrey Hinton. I obtained my Ph.D. at the Université de Montréal, under the supervision of Yoshua Bengio.
My academic involvement includes being a member of the boards for the International Conference on Machine Learning (ICML) and for the Neural Information Processing Systems (NeurIPS) conference. I also co-founded the journal Transactions on Machine Learning Research.
Finally, I have a popular online course on deep learning and neural networks, freely accessible on YouTube.
Research Areas
Authored Publications
Sort By
Static Prediction of Runtime Errors by Learning to Execute Programs with External Resource Descriptions
Rishab Goel
International Conference on Learning Representations (ICLR) (2023)
Preview abstract
The execution behavior of a program often depends on external resources, such as program inputs or file contents, and so cannot be run in isolation. Nevertheless, software developers benefit from fast iteration loops where automated tools identify errors as early as possible, even before programs can be compiled and run. This presents an interesting machine learning challenge: can we predict runtime errors in a ``static'' setting, where program execution is not possible? Here, we introduce a real-world dataset and task for predicting runtime errors, which we show is difficult for generic models like Transformers. As an alternative, we develop an interpreter-inspired architecture with an inductive bias towards mimicking program executions, which models exception handling and ``learns to execute'' descriptions of the contents of external resources. Surprisingly, we show that the model can also predict the location of the error, despite being trained only on labels indicating the presence/absence and kind of error. In total, we present a practical and difficult-yet-approachable challenge problem related to learning program execution and we demonstrate promising new capabilities of interpreter-inspired machine learning models for code.
View details
Head2Toe: Utilizing Intermediate Representations for Better Transfer Learning
Mike Mozer
Proceedings of the 39th International Conference on Machine Learning, PMLR (2022)
Preview abstract
Transfer-learning methods aim to improve performance in a data-scarce target domain using a model pretrained on a source domain. A cost-efficient strategy, , involves freezing the source model and training a new classification head for the target domain. This strategy is outperformed by a more costly but state-of-the-art method--- all parameters of the source model to the target domain---possibly because fine tuning allows the model to leverage useful information from intermediate layers which is otherwise discarded. We explore the hypothesis that these intermediate layers might be directly exploited by linear probing. We propose a method, , that selects features from all layers of the source model to train a target-domain classification head. In evaluations on the Visual Task Adaptation Benchmark, Head2Toe matches performance obtained with fine tuning on average, but critically, for out-of-distribution transfer, Head2Toe outperforms fine tuning.
View details
Preview abstract
Few-shot classification aims to recognize unseen classes given only few samples.
We consider the problem of multi-domain few-shot image classification, where unseen classes and examples come from diverse data sources. This problem has seen growing interest and has inspired the development of benchmarks such as Meta-Dataset. A key challenge in this multi-domain setting is effectively integrating the feature representations from the diverse set of training domains.
Here, we propose a Universal Representation Transformer (URT) layer, that meta-learns to leverage universal features for few-shot classification by dynamically re-weighting and composing the most appropriate domain-specific representations.
In experiments, we show that URT sets a new state-of-the-art result on Meta-Dataset.
Specifically, it outperforms the best previous model on 3 data sources and otherwise matches it on the others.
We analyze variants of URT and present a visualization of the attention score heatmaps that sheds light on how the model performs cross-domain generalization.
View details
Impact of Aliasing on Generalization in Deep Convolutional Networks
Nicolas Le Roux
Rob Romijnders
International Conference on Computer Vision ICCV 2021, IEEE/CVF (2021)
Preview abstract
Traditionally image pre-processing in the frequency domain has played a vital role in computer vision and was even part of the standard pipeline in the early days of Deep Learning. However, with the advent of large datasets many practitioners concluded that this was unnecessary due to the belief that these priors can be learned from the data itself \emph{if they aid in achieving stronger performance}. Frequency aliasing is a phenomena that may occur when down-sampling (sub-sampling) any signal, such as an image or feature map. We demonstrate that substantial improvements on OOD generalization can be obtained by mitigating the effects of aliasing by placing non-trainable blur filters and using smooth activation functions at key locations in the ResNet family of architectures -- helping to achieve new state-of-the-art results on two benchmarks without any hyper-parameter sweeps.
View details
A Unified Few-Shot Classification Benchmark to Compare Transfer and Meta Learning Approaches
Neil Houlsby
Xiaohua Zhai
Sylvain Gelly
NeurIPS Datasets and Benchmarks Track (2021)
Preview abstract
Meta and transfer learning are two successful families of approaches to few-shot learning. Despite highly related goals, state-of-the-art advances in each family are measured largely in isolation of each other. As a result of diverging evaluation norms, a direct or thorough comparison of different approaches is challenging. To bridge this gap, we introduce a few-shot classification evaluation protocol named VTAB+MD with the explicit goal of facilitating sharing of insights from each community. We demonstrate its accessibility in practice by performing a cross-family study of the best transfer and meta learners which report on both a large-scale meta-learning benchmark (Meta-Dataset, MD), and a transfer learning benchmark (Visual Task Adaptation Benchmark, VTAB). We find that, on average, large-scale transfer methods (Big Transfer, BiT) outperform competing approaches on MD, even when trained only on ImageNet. In contrast, meta-learning approaches struggle to compete on VTAB when trained and validated on MD. However, BiT is not without limitations, and pushing for scale does not improve performance on highly out-of-distribution MD tasks. We hope that this work contributes to accelerating progress on few-shot learning research.
View details
Preview abstract
Few-shot dataset generalization is a challenging variant of the well-studied few-shot classification problem where a diverse training set of several datasets is given, for the purpose of training an adaptable model that can then learn classes from new datasets using only a few examples. To this end, we propose to utilize the diverse training set to construct a universal template: a structure that can define a wide array of dataset-specialized models, by plugging in appropriate parameter-light components. For each new few-shot classification problem, our approach therefore only requires inferring a small number of task-specific parameters to insert into the universal template. We design a separate network that produces a carefully-crafted initialization of those parameters for each given task, and we then fine-tune its proposed initialization via a few steps of gradient descent. Our approach is more parameter-efficient, scalable and adaptable compared to previous methods, and achieves state-of-the-art on the challenging Meta-Dataset benchmark.
View details
Preview abstract
The goal of program synthesis from examples is to find a computer program that is consistent with a given set of input-output examples. Most learning-based approaches try to find a program that satisfies all examples at once. Our work, by contrast, considers an approach that breaks the problem into two stages: (a) find programs that satisfy only one example, and (b) leverage these per-example solutions to yield a program that satisfies all examples. We introduce the Cross Aggregator neural network module based on multi-head attention mechanism that learns to combine the cues present in these per-example solutions to synthesize a global solution. Evaluation across programs of different lengths and under two different experimental settings reveal that when given the same budget, our technique significantly improves the success rate over PCCoder [Zohar et. al 2018] and other ablation baselines.
View details
Learning Graph Structure With A Finite-State Automaton Layer
Daniel Dun-ning Woo Johnson
Thirty-fourth Conference on Neural Information Processing Systems (2020)
Preview abstract
Graph-based neural network models are producing strong results in a number of domains, in part because graphs provide flexibility to encode domain knowledge in the form of relational structure (edges) between nodes in the graph. In practice, edges are used both to represent intrinsic structure (e.g., bonds in chemical molecules or abstract syntax trees of programs) and more abstract relations that aid reasoning for a downstream task (e.g., results of relevant program analyses). In this work, we study the problem of learning to derive abstract relations from the intrinsic graph structure. Motivated by their power in program analyses, we consider relations defined by paths on the base graph accepted by a finite-state automaton. We show how to learn these relations end-to-end by relaxing the problem into learning finite-state automata policies on a graph-based POMDP and then training these policies using implicit differentiation. The result is a differentiable Graph Finite-State Automaton (GFSA) layer that adds a new edge type (expressed as a weighted adjacency matrix) to a base graph. We demonstrate that this layer can find shortcuts in grid-world graphs and reproduce simple static analyses on Python programs. Additionally, we combine the GFSA layer with a larger graph-based model trained end-to-end on the variable misuse program understanding task, and find that this model outperforms baseline methods even without providing the hand-engineered semantic edges that those baselines use.
View details
Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples
Eleni Triantafillou
Tyler Zhu
Kelvin Xu
Carles Gelada
International Conference on Learning Representations (submission) (2020)
Preview abstract
Few-shot classification refers to learning a classifier for new classes given only a few examples. While a plethora of models have emerged to tackle this recently, we find the current procedure and datasets that are used to systematically assess progress in this setting lacking. To address this, we propose META-DATASET: a new benchmark for training and evaluating few-shot classifiers that is large-scale, consists of multiple datasets, and presents more natural and realistic tasks. The aim is to measure the ability of state-of the-art models to leverage diverse sources of data to achieve higher generalization, and to evaluate that generalization ability in a more challenging setting. We additionally measure robustness of current methods to variations in the number of available examples and the number of classes. Finally our extensive empirical evaluation leads us to identify weaknesses in Prototypical Networks and MAML, two popular few-shot classification methods, and to propose a new method, ProtoMAML, which achieves improved performance on our benchmark.
View details
The Hanabi Challenge: A New Frontier for AI Research
Nolan Bard
Jakob N. Foerster
Sarath Chandar
Neil Burch
Marc Lanctot
H. Francis Song
Emilio Parisotto
Subhodeep Moitra
Edward Hughes
Iain Dunning
Shibl Mourad
Marc G. Bellemare
Michael Bowling
Artificial Intelligence, 280 (2020)
Preview abstract
From the early days of computing, games have been important testbeds for studying how well machines can do sophisticated decision making. In recent years, machine learning has made dramatic advances with artificial agents reaching superhuman performance in challenge domains like Go, Atari, and some variants of poker. As with their predecessors of chess, checkers, and backgammon, these game domains have driven research by providing sophisticated yet well-defined challenges for artificial intelligence practitioners. We continue this tradition by proposing the game of Hanabi as a new challenge domain with novel problems that arise from its combination of purely cooperative gameplay with two to five players and imperfect information. In particular, we argue that Hanabi elevates reasoning about the beliefs and intentions of other agents to the foreground. We believe developing novel techniques for such theory of mind reasoning will not only be crucial for success in Hanabi, but also in broader collaborative efforts, especially those with human partners. To facilitate future research, we introduce the open-source Hanabi Learning Environment, propose an experimental framework for the research community to evaluate algorithmic advances, and assess the performance of current state-of-the-art techniques.
View details