Cristina Nader Vasconcelos

Cristina Nader Vasconcelos

I am currently a Research Software Engineer at Google Brain group in Montreal. I’m interested in applications of deep learning to computer vision, reinforcement learning, meta-learning, speech and natural language processing. I have a PhD in Computer Graphics from PUC-Rio, Brazil. Previously, I was Associate Professor at the Universidade Federal Fluminense (UFF).

Research Areas

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Scaling Vision Transformers to 22 Billion Parameters
    Josip Djolonga
    Basil Mustafa
    Piotr Padlewski
    Justin Gilmer
    Mathilde Caron
    Rodolphe Jenatton
    Lucas Beyer
    Michael Tschannen
    Anurag Arnab
    Carlos Riquelme
    Gamaleldin Elsayed
    Fisher Yu
    Avital Oliver
    Fantine Huot
    Mark Collier
    Vighnesh Birodkar
    Yi Tay
    Alexander Kolesnikov
    Filip Pavetić
    Thomas Kipf
    Xiaohua Zhai
    Neil Houlsby
    Arxiv (2023)
    Preview abstract The scaling of Transformers has driven breakthrough capabilities for language models. At present, the largest large language models (LLMs) contain upwards of 100B parameters. Vision Transformers (ViT) have introduced the same architecture to image and video modeling, but these have not yet been successfully scaled to nearly the same degree; the largest dense ViT contains 4B parameters. We present a recipe for highly efficient training of a 22B-parameter ViT and perform a wide variety of experiments on the resulting model. When evaluated on downstream tasks (often with a lightweight linear model on frozen features) ViT22B demonstrates increasing performance with scale. We further observe other interesting benefits of scale, including an improved tradeoff between bias and performance, an improved alignment to human visual perception in terms of shape/texture bias, and improved robustness. ViT22B demonstrates the potential for "LLM-like'' scaling in vision, and provides key steps towards getting there. View details
    Proper Reuse of Image Classification Features Improves Object Detection
    Vighnesh Nandan Birodkar
    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR (2022), pp. 13628-13637
    Preview abstract A largely accepted practice in transfer learning is to pre-train a model on a data-abundant upstream task and using the pre-trained weights for model initialization on the downstream task. Specifically, in Object Detection (OD) it is common to initialize the feature backbone with pre-trained ImageNet classifier weights and fine-tune those weights along with the other detection model parameters. Recent work has shown that this practice is not strictly necessary and that it is possible to train an object detector from scratch by training for much longer. In this work we investigate the opposite end of the training spectrum and keep the feature backbone frozen during object detection training, preserving the classifier initialization. Contrary to the common belief that object detectors benefit from end-to-end training, we conjecture that the weight initialization obtained from training on a classifier contains useful knowledge that is forgotten by fine-tuning or avoided entirely when training from scratch, with negative consequences for long-tail classes. As an immediate contribution of our findings, we show that it is possible to train an off-the-shelf object detection model with similar if not superior performance while significantly reducing the need for computational resources, both memory-wise and computationally-wise (FLOPs). The performance benefits of the proposed upstream task knowledge preservation is even more clear when stratifying results by classes and number of annotations available. Our results on MSCOCO, LVIS and Pascal VOC show that our extreme formulation of model reuse has a clear positive impact on full-shot object detection and also on typical hard cases, such as classes with low number of annotations---such as those found in long tail object recognition and few-shot learning. View details
    Impact of Aliasing on Generalization in Deep Convolutional Networks
    Nicolas Le Roux
    Rob Romijnders
    International Conference on Computer Vision ICCV 2021, IEEE/CVF (2021)
    Preview abstract Traditionally image pre-processing in the frequency domain has played a vital role in computer vision and was even part of the standard pipeline in the early days of Deep Learning. However, with the advent of large datasets many practitioners concluded that this was unnecessary due to the belief that these priors can be learned from the data itself \emph{if they aid in achieving stronger performance}. Frequency aliasing is a phenomena that may occur when down-sampling (sub-sampling) any signal, such as an image or feature map. We demonstrate that substantial improvements on OOD generalization can be obtained by mitigating the effects of aliasing by placing non-trainable blur filters and using smooth activation functions at key locations in the ResNet family of architectures -- helping to achieve new state-of-the-art results on two benchmarks without any hyper-parameter sweeps. View details