Jump to Content
Pradeep Shenoy

Pradeep Shenoy

Pradeep Shenoy leads the Cognitive modeling & Machine Learning team at Google Research India, which aims to build expressive, robust ML systems, drawing functional and algorithmic inspiration from human cognition. Pradeep also works on modeling human behavior & cognition, with applications in personalization and human-AI interfaces. Recent work has focused on robust learning via instance reweighting, and its application to a range of problem settings in applied ML. Pradeep has a Ph.D. in Computer Science from the University of Washington & post-doctoral research experience at UC San Diego, where he worked in neuroengineering, computational neuroscience & cognitive science. He has previously led machine learning teams at Microsoft, developing and supporting large-scale production models that predict user behavior (clicks, conversions, audience segmentation, etc.) in sponsored search. Pradeep has also worked in various capacities at Microsoft Research, Fraunhofer Institute, and Lucent Bell Laboratories.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Learned reweighting (LRW) approaches to supervised learning use an optimization criterion to assign weights for training instances, in order to maximize performance on a representative validation dataset. We pose and formalize the problem of optimized selection of the validation set used in LRW training, to improve classifier generalization. In particular, we show that using hard-to-classify instances in the validation set has both a theoretical connection to, and strong empirical evidence of generalization. We provide an efficient algorithm for training this meta-optimized model, as well as a simple train-twice heuristic for careful comparative study. We demonstrate that LRW with easy validation data performs consistently worse than LRW with hard validation data, establishing the validity of our meta-optimization problem. Our proposed algorithm outperforms a wide range of baselines on a range of datasets and domain shift challenges (Imagenet-1K, CIFAR-100, Clothing-1M, CAMELYON, WILDS, etc.), with ~1% gains using VIT-B on Imagenet. We also show that using naturally hard examples for validation (Imagenet-R / Imagenet-A) in LRW training for Imagenet improves performance on both clean and naturally hard test instances by 1-2%. Secondary analyses show that using hard validation data in an LRW framework improves margins on test data, hinting at the mechanism underlying our empirical gains. We believe this work opens up new research directions for the meta-optimization of meta-learning in a supervised learning context. View details
    Using Early Readouts to Mediate Featural Bias in Distillation
    Durga Sivasubramanian
    Anmol Mekala
    Ganesh Ramakrishnan
    WACV 2024 (2024)
    Preview abstract Deep networks tend to learn spurious feature-label correlations in real-world supervised learning tasks. This vulnerability is aggravated in distillation, where a (student) model may have less representational capacity than the corresponding teacher model. Often, knowledge of specific problem features is used to reweight instances & rebalance the learning process. We propose a novel early readout mechanism whereby we attempt to predict the label using representations from earlier network layers. We show that these early readouts automatically identify problem instances or groups in the form of confident, incorrect predictions. We improve group fairness measures across benchmark datasets by leveraging these signals to mediate between teacher logits and supervised label. We extend our results to the closely related but distinct problem of domain generalization, which also critically depends on the quality of learned features. We provide secondary analyses that bring insight into the role of feature learning in supervision and distillation. View details
    Preview abstract Slow concept drift is a ubiquitous, yet under-studied problem in practical machine learning systems. Although recent data is more indicative of future data in these settings, naively prioritizing these instances runs the risk of losing valuable information from the past. We propose an optimization-driven approach towards balancing instance importance over large training windows. First, we model instance relevance using a mixture of multiple timescales of decay, allowing us to capture rich temporal trends. Second, we learn an auxiliary \textit{scorer model} that recovers the appropriate mixture of timescales as a function of the instance itself. Finally, we propose a nested optimization objective for learning the scorer, by which it maximizes forward transfer for the learned model. Experiments on a large real-world dataset of 39M photos over a 9 year period show upto 15\% relative gains in accuracy compared to other robust learning baselines. We replicate our gains on two collections of real-world datasets for non-stationary learning, and extend our work to continual learning settings where, too, we beat SOTA methods by large margins. View details
    Preview abstract Predictive uncertainty-a model's self awareness regarding its accuracy on an input-is key for both building robust models via training interventions and for test-time applications such as selective classification. We propose a novel instance-conditioned reweighting approach that captures predictive uncertainty using an auxiliary network and unifies these train- and test-time applications. The auxiliary network is trained using a meta-objective in a bilevel optimization framework. A key contribution of our proposal is the meta-objective of minimizing the dropout variance, an approximation of Bayesian Predictive uncertainty. We show in controlled experiments that we effectively capture the diverse specific notions of uncertainty through this meta-objective, while previous approaches only capture certain aspects. These results translate to significant gains in real-world settings-selective classification, label noise, domain adaptation, calibration-and across datasets-Imagenet, Cifar100, diabetic retinopathy, Camelyon, WILDs, Imagenet-C,-A,-R, Clothing1M, etc. For Diabetic Retinopathy, we see upto 3.4%/3.3% accuracy and AUC gains over SOTA in selective classification. We also improve upon large-scale pretrained models such as PLEX. View details
    Edges to Shapes to Concepts: Adversarial Augmentation for Robust Vision
    Aditay Tripathi
    Rishubh Singh
    Anirban Chakraborty
    Computer Vision and Pattern Recognition (CVPR) 2023 (2023) (to appear)
    Preview abstract Recent work has shown that deep vision models tend to be overly dependent on low-level or “texture” features, leading to poor generalization. Various data augmentation strategies have been proposed to overcome this so-called texture bias in DNNs. We propose a simple, lightweight adversarial augmentation technique that explicitly incentivizes the network to learn holistic shapes for accurate prediction in an object classification setting. Our augmentations superpose automatically detected edgemaps from one image onto another image with shuffled patches, using a randomly determined mixing proportion, with the image label of the edgemap image. To classify these augmented images, the model needs to not only detect and focus on edges but distinguish between relevant and spurious edges. We show that our augmentations significantly improve classification accuracy and robustness measures on a range of datasets and neural architectures. As an example, VIT-large accuracy on ImageNet classification increases by up to 6%., and other similar metrics. Analysis using a range of probe datasets shows substantially increased shape sensitivity in our trained models, explaining the observed improvement in both classification accuracy and downstream tasks such as segmentation. View details
    Preview abstract Reliable outlier detection is critical for real-world applications of deep learning models. Likelihoods produced by deep generative models, although extensively studied, have been largely dismissed as being impractical for outlier detection. For one, deep generative model likelihoods are readily biased by low-level input statistics. Second, many recent solutions for correcting these biases are computationally expensive or do not generalize well to complex, natural datasets. Here, we explore outlier detection with a state-of-the-art deep autoregressive model: PixelCNN++. We show that biases in PixelCNN++ likelihoods arise primarily from predictions based on local dependencies. We propose two families of bijective transformations that we term “shaking” and “stirring”, which ameliorate low-level biases and isolate the contribution of long-range dependencies to the PixelCNN++ likelihood. These transformations are computationally inexpensive and readily applied at evaluation time. We evaluate our approaches extensively with five grayscale and six natural image datasets and show that they achieve or exceed state-of-the-art outlier detection performance. In sum, lightweight remedies suffice to achieve robust outlier detection on images with deep autoregressive models. View details
    Adaptive mixing of auxiliary losses in supervised learning
    Durga Sivasubramanian
    Ayush Maheshwari
    Prathosh AP
    Ganesh Ramakrishnan
    AAAI 2023 (2023) (to appear)
    Preview abstract In several supervised learning scenarios, auxiliary losses are used in order to introduce additional information or constraints into the supervised learning objective. For instance, knowledge distillation aims to mimic outputs of a powerful teacher model; similarly, in rule-based approaches, weak labeling information is provided by labeling functions which may be noisy rule-based approximations to true labels. We tackle the problem of learning to combine these losses in a principled manner. Our proposal, AMAL, uses a bi-level optimization criterion on validation data to learn optimal mixing weights, at an instance-level, over the training data. We describe a meta-learning approach towards solving this bi-level objective, and show how it can be applied to different scenarios in supervised learning. Experiments in a number of knowledge distillation and rule denoising domains show that AMAL provides noticeable gains over competitive baselines in those domains. We empirically analyze our method and share insights into the mechanisms through which it provides performance gains. View details
    Overcoming simplicity bias in deep networks using a feature sieve
    International Conference on Machine Learning (ICML) (2023) (to appear)
    Preview abstract Simplicity bias is the concerning tendency of deep networks to over-depend on simple, weakly predictive features, to the exclusion of stronger, more complex features. This causes biased, incorrect model predictions in many real-world applications when trained on incomplete data containing spurious feature-label correlations. We propose a direct, interventional method for addressing simplicity bias in DNNs, called the feature sieve. We aim to automatically identify and suppress easily-computable features in lower layers of the network, thereby allowing the higher network levels to extract and utilize richer, more meaningful representations. We provide concrete evidence of this differential suppression & enhancement of features using controlled datasets, and report substantial gains on many real-world debiasing benchmarks (11.4\% relative gain on ImageNet-A; 3.2% on BAR, etc). Crucially, we outperform many baselines that incorporate knowledge about ``simple'' features, or known spurious attributes, despite our method not using any such information. We believe that our feature sieve work opens up exciting new research directions in automatic adversarial feature extraction techniques for deep networks. View details
    Preview abstract Concept bottleneck models (CBMs) (Koh et al. 2020) are interpretable neural networks that first predict labels for human-interpretable concepts relevant to the prediction task, and then predict the final label based on the concept label predictions. We extend CBMs to interactive prediction settings where the model can query a human collaborator for the label to some concepts. We develop an interaction policy that, at prediction time, chooses which concepts to request a label for so as to maximally improve the final prediction. We demonstrate that a simple policy combining concept prediction uncertainty and influence of the concept on the final prediction achieves strong performance and outperforms a static approach proposed in Koh et al. (2020) as well as active feature acquisition methods proposed in the literature. We show that the interactive CBM can achieve accuracy gains of 5-10% with only 5 interactions over competitive baselines on the Caltech UCSB Birds dataset and the Chexpert dataset. View details
    Preview abstract The options framework in Hierarchical Reinforcement Learning breaks down overall goals into a combination of options or simpler tasks and associated policies, allowing for abstraction in the action space. Ideally, these options can be reused across different higher-level goals; indeed, many previous approaches have proposed limited forms of transfer of prelearned options to new task settings. We propose a novel "option indexing" approach to hierarchical learning (OI-HRL), where we learn an affinity function between options and the functionalities (or affordances) supported by the environment. This allows us to effectively reuse a large library of pretrained options, in zero-shot generalization at test time, by restricting goal-directed learning to only those options relevant to the task at hand. We develop a meta-training loop that learns the representations of options and environment affordances over a series of HRL problems, by incorporating feedback about the relevance of retrieved options to the higher-level goal. In addition to a substantial decrease in sample complexity compared to learning HRL policies from scratch, we also show significant gains over baselines that have the entire option pool available for learning the hierarchical policy. View details
    Meta-learning of dynamic policy adjustments in inhibitory control tasks
    Soumya Chatterjee
    Aakriti Kumar
    CogSci 2022 (2022) (to appear)
    Preview abstract Simple perceptual decision-making tasks such as the Stroop and flanker tasks are popular as a method of measuring individual variation in the processing of conflicting visual stimuli--for instance, the difference in accuracy on stimuli with and without conflict. A major challenge in applying these tasks, for instance to compare two different populations of subjects, is the low reliability of the nonparametric measures of performance in the tasks. Here, we model dynamic adjustments in decision policies often seen in human behavior, thereby capturing trial-by-trial variation in decision policies, in addition to the classically used average statistics. We propose a recurrent network model, and a novel meta-learning algorithm MixMP, to capture behavioral strategies in the task in a model-agnostic manner, and to overcome small-sample learning challenges by pooling across subjects. We show that by splitting the learning into a complex, shared metamodel and simple subject-specific parameters, we learn significantly better predictive models, and also identify latent dimensions indexing the decision policy that may serve as a better measure of individual differences in the task. View details
    FLOAT: Factorized Learning of Object Attributes for Improved Multi-object Multi-part Scene Parsing
    Pranav Gupta
    Ravi Kiran Sarvadevabhatla
    Rishubh Singh
    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022) (to appear)
    Preview abstract Multi-object multi-part scene parsing is a challenging task which requires detecting multiple object classes in a scene and segmenting the semantic parts within each object. In this paper, we propose FLOAT, a factorized label space framework for scalable multi-object multi-part parsing. Our framework involves independent dense prediction of object category and part attributes which increases scalability and reduces task complexity compared to the monolithic label space counterpart. In addition, we propose an inference-time 'zoom' refinement technique which significantly improves segmentation quality, especially for smaller objects/parts. Compared to state of the art, FLOAT obtains an absolute improvement of 2.0% for mean IOU (mIOU) and 4.8% for segmentation quality IOU (sqIOU) on the Pascal-Part-58 dataset. For the larger Pascal-Part-108 dataset, the improvements are 2.1% for mIOU and 3.9% for sqIOU. We incorporate previously excluded part attributes and other minor parts of the Pascal-Part dataset to create the most comprehensive and challenging version which we dub Pascal-Part-201. FLOAT obtains improvements of 8.6% for mIOU and 7.5% for sqIOU on the new dataset, demonstrating its parsing effectiveness across a challenging diversity of objects and parts. View details
    GCR: Gradient coreset based replay buffer selection for continual learning
    Krishnateja Killamsetty
    Rishabh Iyer
    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), pp. 99-108
    Preview abstract Continual learning (CL) aims to develop techniques by which a single model adapts to an increasing number of tasks encountered sequentially, thereby potentially leveraging learnings across tasks in a resource-efficient manner. A major challenge for CL systems is catastrophic forgetting, where earlier tasks are forgotten while learning a new task. To address this, replay-based CL approaches maintain and repeatedly retrain on a small buffer of data selected across encountered tasks. We propose Gradient Coreset Replay, a novel strategy for replay buffer selection and update using a carefully designed optimization criterion. Specifically, we select and maintain a “coreset” that closely approximates the gradient of all the data seen so far with respect to current model parameters, and discuss key strategies needed for its effective application to the continual learning setting. We show significant gains (2%-4%) over the state-of-the-art in the well-studied offline continual learning setting. Our findings also effectively transfer to online / streaming CL settings, showing up to 5% gains over existing approaches. Finally, we demonstrate the value of supervised contrastive loss for continual learning, which yields a cumulative gain of up to 5% accuracy when combined with our subset selection strategy. View details
    Preview abstract Deep networks often make confident, yet, incorrect, predictions when tested with outlier data that is far removed from their training distributions. Likelihoods computed by deep generative models (DGMs) are a candidate metric for outlier detection with unlabeled data. Yet, previous studies have shown that DGM likelihoods are unreliable and can be easily biased by simple transformations to input data. Here, we examine outlier detection with variational autoencoders (VAEs), among the simplest of DGMs. We propose novel analytical and algorithmic approaches to ameliorate key biases with VAE likelihoods. Our bias corrections are sample-specific, computationally inexpensive, and readily computed for various decoder visible distributions. Next, we show that a well-known image pre-processing technique – contrast stretching – extends the effectiveness of bias correction to further improve outlier detection. Our approach achieves state-of-the-art accuracies with nine grayscale and natural image datasets, and demonstrates significant advantages – both with speed and performance – over four recent, competing approaches. In summary, lightweight remedies suffice to achieve robust outlier detection with VAEs. View details
    Preview abstract In decision making tasks under uncertainty, humans display characteristic biases in seeking, integrating, and acting upon information relevant to the task. Here, we build upon carefully designed experiments, and data collected at scale (Hunt et al. 2016), that measured and catalogued these biases in aggregate form. We design deep learning models that replicate these biases in aggregate, while also capturing individual variation in behavior. A key finding of our work is that paucity of data collected from each individual subject can be over-come by sampling large numbers of subjects form the population, while still capturing individual differences.In addition, we can predict human behavior with high accuracy without making any assumptions about task goals, reward structure, or individual biases, thus providing a model-agnostic fit to human behavior in the task. Such an approach can sidestep potential limitations in modeler-specified inductive biases, and has implications for computational modeling of human cognitive function in general, and of human-AI interfaces in particular. View details
    Tracking what matters: a decision-variable account of human behavior in bandit tasks
    Vishwajeet Agrawal
    Annual Meeting of the Cognitive Science Society (CogSci 2021) (2021) (to appear)
    Preview abstract We study human learning & decision-making in tasks with probabilistic rewards. Recent studies in a 2-armed bandit task find that a modification of classical Q-learning algorithms, with context-dependent learning rates, better explains behavior compared to constant learning rates. We propose a simple alternative: humans directly track the decision variable underlying choice in the task. Under this reframing, the asymmetric learning rates can be reinterpreted as moving towards certainty in choice. We describe how our model incorporates partial feedback (outcomes on chosen arms) and complete feed- back (outcome on chosen & unchosen arms), and show that our model significantly outperforms previously proposed models on a range of datasets. Our reframing of the computational models adds nuance to previous findings of perseverative behavior in bandit tasks; we show evidence of context- dependent choice perseveration, i.e., that humans persevere in their choices unless contradictory evidence is presented. View details
    No Results Found