Ross Goroshin

Ross Goroshin

I am currently a Research Scientist in the Google Brain Montreal team. My main area of expertise is Deep Learning. I'm mainly interested in building flexible and robust computer vision systems by applying ideas from self-supervised and meta-learning. Prior to joining the Brain team I was at DeepMind (London UK) where I worked on navigation problems using reinforcement learning. I completed by PhD under the supervision of Yann LeCun at NYU.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks
    Jesse Farebrother
    Joshua Greaves
    Charline Le Lan
    Marc Bellemare
    International Conference on Learning Representations (ICLR) (2023)
    Preview abstract Auxiliary tasks improve the representations learned by deep reinforcement learning agents. Analytically, their effect is reasonably well-understood; in practice, how-ever, their primary use remains in support of a main learning objective, rather than as a method for learning representations. This is perhaps surprising given that many auxiliary tasks are defined procedurally, and hence can be treated as an essentially infinite source of information about the environment. Based on this observation, we study the effectiveness of auxiliary tasks for learning rich representations, focusing on the setting where the number of tasks and the size of the agent’s network are simultaneously increased. For this purpose, we derive a new family of auxiliary tasks based on the successor measure. These tasks are easy to implement and have appealing theoretical properties. Combined with a suitable off-policy learning rule, the result is a representation learning algorithm that can be understood as extending Mahadevan & Maggioni (2007)’s proto-value functions to deep reinforcement learning – accordingly, we call the resulting object proto-value networks. Through a series of experiments on the Arcade Learning Environment, we demonstrate that proto-value networks produce rich features that may be used to obtain performance comparable to established algorithms, using only linear approximation and a small number (~4M) of interactions with the environment’s reward function. View details
    Preview abstract Meta and transfer learning are two successful families of approaches to few-shot learning. Despite highly related goals, state-of-the-art advances in each family are measured largely in isolation of each other. As a result of diverging evaluation norms, a direct or thorough comparison of different approaches is challenging. To bridge this gap, we introduce a few-shot classification evaluation protocol named VTAB+MD with the explicit goal of facilitating sharing of insights from each community. We demonstrate its accessibility in practice by performing a cross-family study of the best transfer and meta learners which report on both a large-scale meta-learning benchmark (Meta-Dataset, MD), and a transfer learning benchmark (Visual Task Adaptation Benchmark, VTAB). We find that, on average, large-scale transfer methods (Big Transfer, BiT) outperform competing approaches on MD, even when trained only on ImageNet. In contrast, meta-learning approaches struggle to compete on VTAB when trained and validated on MD. However, BiT is not without limitations, and pushing for scale does not improve performance on highly out-of-distribution MD tasks. We hope that this work contributes to accelerating progress on few-shot learning research. View details
    Impact of Aliasing on Generalization in Deep Convolutional Networks
    Nicolas Le Roux
    Rob Romijnders
    International Conference on Computer Vision ICCV 2021, IEEE/CVF (2021)
    Preview abstract Traditionally image pre-processing in the frequency domain has played a vital role in computer vision and was even part of the standard pipeline in the early days of Deep Learning. However, with the advent of large datasets many practitioners concluded that this was unnecessary due to the belief that these priors can be learned from the data itself \emph{if they aid in achieving stronger performance}. Frequency aliasing is a phenomena that may occur when down-sampling (sub-sampling) any signal, such as an image or feature map. We demonstrate that substantial improvements on OOD generalization can be obtained by mitigating the effects of aliasing by placing non-trainable blur filters and using smooth activation functions at key locations in the ResNet family of architectures -- helping to achieve new state-of-the-art results on two benchmarks without any hyper-parameter sweeps. View details
    Preview abstract Fully convolutional deep correlation networks are currently the state of the art approaches to single object visual tracking. It is commonly assumed that these networks perform tracking by detection by matching features of the object instance with features of the scene. Strong architectural priors and conditioning on the object representation is thought to encourage this tracking strategy. Despite these efforts, we show that deep trackers often default to “tracking by saliency” detection – without relying on the object representation. This leads us to introduce an auxiliary detection task that encourages more discriminative object representations and improves tracking performance. View details
    Preview abstract Few-shot classification refers to learning a classifier for new classes given only a few examples. While a plethora of models have emerged to tackle this recently, we find the current procedure and datasets that are used to systematically assess progress in this setting lacking. To address this, we propose META-DATASET: a new benchmark for training and evaluating few-shot classifiers that is large-scale, consists of multiple datasets, and presents more natural and realistic tasks. The aim is to measure the ability of state-of the-art models to leverage diverse sources of data to achieve higher generalization, and to evaluate that generalization ability in a more challenging setting. We additionally measure robustness of current methods to variations in the number of available examples and the number of classes. Finally our extensive empirical evaluation leads us to identify weaknesses in Prototypical Networks and MAML, two popular few-shot classification methods, and to propose a new method, ProtoMAML, which achieves improved performance on our benchmark. View details
    Vector-based Navigation using Grid-like Representations in Artificial Agents.
    Alexander Pritzel
    Andrea Banino
    Benigno Uria
    Brian C Zhang
    Caswell Barry
    Charles Blundell
    Charlie Beattie
    Demis Hassabis
    Dharshan Kumaran
    Greg Wayne
    Helen King
    Hubert Soyer
    Joseph Modayil
    Koray Kavukcuoglu
    Martin J. Chadwick
    Neil Rabinowitz
    Raia Hadsell
    Razvan Pascanu
    Stephen Gaffney
    Stig Vilholm Petersen
    Thomas Degris
    Timothy Lillicrap
    Nature (2018)
    Preview abstract Efficient navigation is a fundamental component of mammalian behaviour but remains challenging for artificial agents. Mammalian spatial behaviour is underpinned by grid cells in the entorhinal cortex, providing a multi-scale periodic representation that functions as a metric for coding space. Grid cells are viewed as critical for integrating self-motion (path integration) and planning direct trajectories to goals (vector-based navigation).We report, for the first time, that brain-like grid representations can emerge as the product of optimizing a recurrent network to perform the task of path integration - providing a normative perspective on the role of grid cells as a compact code for representing space. We show that grid cells provide an effective basis set to optimize the primary objective of navigation through deep reinforcement learning (RL) - the rapid discovery and exploitation of goals in complex, unfamiliar, and changeable environments. The performance of agents endowed with grid-like representations was found to surpass that of an expert human and comparison agents. Further, we demonstrate that grid-like representations enable agents to conduct shortcut behaviours reminiscent of those performed by mammals - with decoding analyses confirming that the metric quantities necessary for vector-based navigation (e.g. Euclidean distance and direction to goal) are represented within the network. Our findings show that emergent grid-like responses furnish agents with a Euclidean spatial metric and associated vector operations, providing a foundation for proficient navigation. As such, our results support neuroscientific theories that see grid cells as critical for path integration and vector-based navigation, demonstrating that the latter can be combined with path- based strategies to support navigation in complex environments View details