Vincent Dumoulin

Vincent Dumoulin

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Head2Toe: Utilizing Intermediate Representations for Better Transfer Learning
    Mike Mozer
    Proceedings of the 39th International Conference on Machine Learning, PMLR (2022)
    Preview abstract Transfer-learning methods aim to improve performance in a data-scarce target domain using a model pretrained on a source domain. A cost-efficient strategy, , involves freezing the source model and training a new classification head for the target domain. This strategy is outperformed by a more costly but state-of-the-art method--- all parameters of the source model to the target domain---possibly because fine tuning allows the model to leverage useful information from intermediate layers which is otherwise discarded. We explore the hypothesis that these intermediate layers might be directly exploited by linear probing. We propose a method, , that selects features from all layers of the source model to train a target-domain classification head. In evaluations on the Visual Task Adaptation Benchmark, Head2Toe matches performance obtained with fine tuning on average, but critically, for out-of-distribution transfer, Head2Toe outperforms fine tuning. View details
    Proper Reuse of Image Classification Features Improves Object Detection
    Vighnesh Nandan Birodkar
    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR (2022), pp. 13628-13637
    Preview abstract A largely accepted practice in transfer learning is to pre-train a model on a data-abundant upstream task and using the pre-trained weights for model initialization on the downstream task. Specifically, in Object Detection (OD) it is common to initialize the feature backbone with pre-trained ImageNet classifier weights and fine-tune those weights along with the other detection model parameters. Recent work has shown that this practice is not strictly necessary and that it is possible to train an object detector from scratch by training for much longer. In this work we investigate the opposite end of the training spectrum and keep the feature backbone frozen during object detection training, preserving the classifier initialization. Contrary to the common belief that object detectors benefit from end-to-end training, we conjecture that the weight initialization obtained from training on a classifier contains useful knowledge that is forgotten by fine-tuning or avoided entirely when training from scratch, with negative consequences for long-tail classes. As an immediate contribution of our findings, we show that it is possible to train an off-the-shelf object detection model with similar if not superior performance while significantly reducing the need for computational resources, both memory-wise and computationally-wise (FLOPs). The performance benefits of the proposed upstream task knowledge preservation is even more clear when stratifying results by classes and number of annotations available. Our results on MSCOCO, LVIS and Pascal VOC show that our extreme formulation of model reuse has a clear positive impact on full-shot object detection and also on typical hard cases, such as classes with low number of annotations---such as those found in long tail object recognition and few-shot learning. View details
    Preview abstract Few-shot dataset generalization is a challenging variant of the well-studied few-shot classification problem where a diverse training set of several datasets is given, for the purpose of training an adaptable model that can then learn classes from new datasets using only a few examples. To this end, we propose to utilize the diverse training set to construct a universal template: a structure that can define a wide array of dataset-specialized models, by plugging in appropriate parameter-light components. For each new few-shot classification problem, our approach therefore only requires inferring a small number of task-specific parameters to insert into the universal template. We design a separate network that produces a carefully-crafted initialization of those parameters for each given task, and we then fine-tune its proposed initialization via a few steps of gradient descent. Our approach is more parameter-efficient, scalable and adaptable compared to previous methods, and achieves state-of-the-art on the challenging Meta-Dataset benchmark. View details
    Impact of Aliasing on Generalization in Deep Convolutional Networks
    Nicolas Le Roux
    Rob Romijnders
    International Conference on Computer Vision ICCV 2021, IEEE/CVF (2021)
    Preview abstract Traditionally image pre-processing in the frequency domain has played a vital role in computer vision and was even part of the standard pipeline in the early days of Deep Learning. However, with the advent of large datasets many practitioners concluded that this was unnecessary due to the belief that these priors can be learned from the data itself \emph{if they aid in achieving stronger performance}. Frequency aliasing is a phenomena that may occur when down-sampling (sub-sampling) any signal, such as an image or feature map. We demonstrate that substantial improvements on OOD generalization can be obtained by mitigating the effects of aliasing by placing non-trainable blur filters and using smooth activation functions at key locations in the ResNet family of architectures -- helping to achieve new state-of-the-art results on two benchmarks without any hyper-parameter sweeps. View details
    Domain Conditional Predictors for Domain Adaptation
    Dar-Shyang Lee
    Jianqiao Feng
    Joao Monteiro
    Xavier Gibert
    Proceedings of Machine Learning Research (PMLR), 148 (2021), pp. 193-220
    Preview abstract Learning guarantees often rely on assumptions of i.i.d. data, which will likely be violated in practice once predictors are deployed to perform real-world tasks. Domain adaptation approaches thus appeared as a useful framework yielding extra flexibility in that distinct train and test data distributions are supported, provided that other assumptions are satisfied such as covariate shift, which expects the conditional distributions over labels to be independent of the underlying data distribution. Several approaches were introduced in order to induce generalization across varying train and test data sources, and those often rely on the general idea of domain-invariance, in such a way that the data-generating distributions are to be disregarded by the prediction model. In this contribution, we tackle the problem of generalizing across data sources by approaching it from the opposite direction: we consider a conditional modeling approach in which predictions, in addition to being dependent of the input data, use information relative to the underlying data-generating distribution. For instance, the model has an explicit mechanism to adapt to changing environments and/or new data sources. We argue that such an approach is more general than current domain adaptation methods since it does not require extra assumptions such as covariate shift and further yields simpler training algorithms that avoid a common source of training instabilities caused by minimax formulations, often employed in domain-invariant methods. View details
    Preview abstract Meta and transfer learning are two successful families of approaches to few-shot learning. Despite highly related goals, state-of-the-art advances in each family are measured largely in isolation of each other. As a result of diverging evaluation norms, a direct or thorough comparison of different approaches is challenging. To bridge this gap, we introduce a few-shot classification evaluation protocol named VTAB+MD with the explicit goal of facilitating sharing of insights from each community. We demonstrate its accessibility in practice by performing a cross-family study of the best transfer and meta learners which report on both a large-scale meta-learning benchmark (Meta-Dataset, MD), and a transfer learning benchmark (Visual Task Adaptation Benchmark, VTAB). We find that, on average, large-scale transfer methods (Big Transfer, BiT) outperform competing approaches on MD, even when trained only on ImageNet. In contrast, meta-learning approaches struggle to compete on VTAB when trained and validated on MD. However, BiT is not without limitations, and pushing for scale does not improve performance on highly out-of-distribution MD tasks. We hope that this work contributes to accelerating progress on few-shot learning research. View details
    Preview abstract Few-shot classification refers to learning a classifier for new classes given only a few examples. While a plethora of models have emerged to tackle this recently, we find the current procedure and datasets that are used to systematically assess progress in this setting lacking. To address this, we propose META-DATASET: a new benchmark for training and evaluating few-shot classifiers that is large-scale, consists of multiple datasets, and presents more natural and realistic tasks. The aim is to measure the ability of state-of the-art models to leverage diverse sources of data to achieve higher generalization, and to evaluate that generalization ability in a more challenging setting. We additionally measure robustness of current methods to variations in the number of available examples and the number of classes. Finally our extensive empirical evaluation leads us to identify weaknesses in Prototypical Networks and MAML, two popular few-shot classification methods, and to propose a new method, ProtoMAML, which achieves improved performance on our benchmark. View details
    The Hanabi Challenge: A New Frontier for AI Research
    Nolan Bard
    Jakob N. Foerster
    Sarath Chandar
    Neil Burch
    Marc Lanctot
    H. Francis Song
    Emilio Parisotto
    Subhodeep Moitra
    Edward Hughes
    Iain Dunning
    Shibl Mourad
    Marc G. Bellemare
    Michael Bowling
    Artificial Intelligence, 280 (2020)
    Preview abstract From the early days of computing, games have been important testbeds for studying how well machines can do sophisticated decision making. In recent years, machine learning has made dramatic advances with artificial agents reaching superhuman performance in challenge domains like Go, Atari, and some variants of poker. As with their predecessors of chess, checkers, and backgammon, these game domains have driven research by providing sophisticated yet well-defined challenges for artificial intelligence practitioners. We continue this tradition by proposing the game of Hanabi as a new challenge domain with novel problems that arise from its combination of purely cooperative gameplay with two to five players and imperfect information. In particular, we argue that Hanabi elevates reasoning about the beliefs and intentions of other agents to the foreground. We believe developing novel techniques for such theory of mind reasoning will not only be crucial for success in Hanabi, but also in broader collaborative efforts, especially those with human partners. To facilitate future research, we introduce the open-source Hanabi Learning Environment, propose an experimental framework for the research community to evaluate algorithmic advances, and assess the performance of current state-of-the-art techniques. View details
    Domain Conditional Predictors for Domain Adaptation
    Joao Monteiro
    Xavier Gibert Serra
    Jianqiao Feng
    Dar-Shyang Lee
    PMLR, pp. 193-220
    Preview abstract Learning guarantees often rely on assumptions of i.i.d. data, which will likely be violated in practice once predictors are deployed to perform real-world tasks. Domain adaptation approaches thus appeared as a useful framework yielding extra flexibility in that distinct train and test data distributions are supported, provided that other assumptions are satisfied such as covariate shift, which expects the conditional distributions over labels to be independent of the underlying data distribution. Several approaches were introduced in order to induce generalization across varying train and test data sources, and those often rely on the general idea of domain-invariance, in such a way that the data-generating distributions are to be disregarded by the prediction model. In this contribution, we tackle the problem of generalizing across data sources by taking the opposite direction. We consider a conditional modeling approach in which predictions, in addition of being dependent of the input data, use information relative to the underlying data-generating distribution. For instance, the model has an explicit mechanism to adapt to changing environments and/or new data sources. We argue that such an approach is more general than current domain adaptation methods since it does not require extra assumptions such as covariate shift and further yields simpler training algorithms that avoid a common source of training instabilities caused by minimax formulations, often employed in domain-invariant methods. View details
    Feature-wise transformations
    Ethan Perez
    Nathan Schucher
    Florian Strub
    Harm de Vries
    Aaron Courville
    Yoshua Bengio
    Distill (2018)
    Preview abstract In this article, we dive into the subject of feature-wise transformations, showing that they find their way into a surprising number of recent neural network architectures used in various problem settings. We discuss feature-wise transformations as a family of related approaches and show how they can be conceptualized using the Feature-wise Linear Modulation (FiLM) nomenclature. We will then point out their numerous uses in the recent literature. Finally, we will take a look at interesting and intriguing properties that arise from the use of FiLM. View details