Harikrishna Narasimhan

Harikrishna Narasimhan

I am a Senior Research Scientist at Google Research in Mountain View, USA. Prior to joining Google, I was a post-doctoral researcher at Harvard University. I completed my PhD from the Indian Institute of Science, Bangalore. My research interests broadly lie in the areas of machine learning and learning theory, with focus on complex evaluation metrics, constrained optimization and algorithmic fairness. Please see my personal web page for my full list of publications.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Distributionally Robust Post-hoc Classifiers under Prior Shifts
    Jiaheng Wei
    Ehsan Amid
    Vincent Chu
    Yang Liu
    Abhishek Kumar
    ICLR(2023)
    Preview abstract The generalization ability of machine learning models degrades significantly when the test distribution shifts away from the training distribution. We investigate the problem of training models that are robust to shifts caused by changes in the distribution of class-priors or group-priors. The presence of skewed training priors can often lead to the models overfitting to spurious features. Unlike existing methods, which optimize for either the worst or the average performance over classes or groups, our work is motivated by the need for finer control over the robustness properties of the model. We present a lightweight post-hoc approach that performs scaling adjustments to predictions from a fixed pre-trained model, with the goal of minimizing a distributionally robust loss around a chosen target distribution. These adjustments are computed by solving a constrained optimization problem on a validation set and applied during test time. We propose a novel evaluation metric to test the models on a spectrum of controlled distribution shifts. While being extremely lightweight and fast, our method comes with provable guarantees and compares favorably with the existing more complex methods for this problem on standard class imbalance and group distributional robustness benchmarks. View details
    Preview abstract Many practical settings allow a learner to defer predictions to one or more costly experts. For example, the learning to defer paradigm allows a learner to defer to a human expert, at some monetary cost. Similarly, the adaptive inference paradigm allows a base model to defer to one or more large models, at some computational cost. The goal in these settings is to learn classification and deferral mechanisms to optimise a suitable accuracy-cost tradeoff. To achieve this, a central issue studied in prior work is the design of a coherent loss function for both mechanisms. In this work, we demonstrate that existing losses have two subtle limitations: they can encourage underfitting when there is a high cost of deferring, and the deferral function can have a weak dependence on the base model predictions. To resolve these issues, we propose a post-hoc training scheme: we train a deferral function on top of a base model, with the objective of predicting to defer when the base model's error probability exceeds the cost of the expert model. This may be viewed as applying a partial surrogate to the ideal deferral loss, which can lead to a tighter approximation and thus better performance. Empirically, we verify the efficacy of post-hoc training on benchmarks for learning to defer and adaptive inference. View details
    Preview abstract In real-world systems, models are frequently updated as more data becomes available, and in addition to achieving high accuracy, the goal is to also maintain a low difference in predictions compared to the base model (i.e. predictive churn). If model retraining results in vastly different behavior, then it could cause negative effects in downstream systems, especially if this churn can be avoided with limited impact on model accuracy. In this paper, we show an equivalence between training with distillation using the base model as the teacher and training with an explicit constraint on the predictive churn. We then show that distillation performs strongly for low churn training against a number of recent baselines on a wide range of datasets and model architectures, including fully-connected networks, convolutional networks, and transformers. View details
    Implicit rate-constrained optimization of non-decomposable objectives
    Abhishek Kumar
    Andy Cotter
    Proceedings of the 38th International Conference on Machine Learning (ICML), 2021
    Preview abstract We consider a popular family of constrained optimization problems arising in machine learning that involve optimizing a non-decomposable evaluation metric with a certain thresholded form, while constraining another metric of interest. Examples of such problems include optimizing the false negative rate at a fixed false positive rate, optimizing precision at a fixed recall, optimizing the area under the precision-recall or ROC curves, etc. Our key idea is to formulate a rate-constrained optimization that expresses the threshold parameter as a function of the model parameters via the Implicit Function theorem. We show how the resulting optimization problem can be solved using standard gradient based methods. Experiments on benchmark datasets demonstrate the effectiveness of our proposed method over existing state-of-the-art approaches for these problems. The code for the proposed method is available at this url: https://github.com/google-research/google-research/tree/master/implicit_constrained_optimization. View details
    Optimal Auctions through Deep Learning
    Paul Duetting
    Zhe Feng
    David C. Parkes
    Sai Srivatsa Ravindranath
    Communications of the ACM, 64 (Issue 8)(2021), pp. 109-116
    Preview abstract Designing an incentive compatible auction that maximizes expected revenue is an intricate task. The single-item case was resolved in a seminal piece of work by Myerson in 1981. Even after 30-40 years of intense research the problem remains unsolved for settings with two or more items. We overview recent research results that show how tools from deep learning are shaping up to be become a powerful tool for the automated design of optimal auctions. In this approach, an auction is modeled as a multi-layer neural network, with optimal auction design framed as a constrained learning problem, and solved through standard machine learning pipelines. Through this approach, it is possible to recover essentially all known analytical solutions for multi-item settings, and obtain novel mechanisms for settings in which the optimal mechanism is unknown. View details
    Optimizing Blackbox Metrics with Iterative Example Weighting
    Gaurush Hiranandani
    Jatin Mathur
    Mahdi Milani Fard
    Sanmi Koyejo
    Proceedings of the 38th International Conference on Machine Learning (ICML), 2021
    Preview abstract We consider learning to optimize a classification metric defined by a black-box function of the confusion matrix. Such black-box learning settings are ubiquitous, for example, when the learner only has query access to the metric of interest, or in noisy-label and domain adaptation applications where the learner must evaluate the metric via performance evaluation using a small validation sample. Our approach is to adaptively learn example weights on the training dataset such that the resulting weighted objective best approximates the metric on the validation sample. We show how to model and estimate the example weights and use them to iteratively post-shift a pre-trained class probability estimator to construct a classifier and analyze the resulting procedure's statistical properties. Experiments on various label noise, domain shift, and fair classification setups confirm that our proposal is better than the individual state-of-the-art baselines for each application. View details
    Preview abstract In machine learning applications such as ranking fairness or fairness over intersectional groups, one often encounters optimization problems with an extremely large number of constraints. In particular, with ranking fairness tasks, there may even be a variable number of constraints, e.g. one for each query in the training set. In these cases, the standard approach of optimizing a Lagrangian while maintaining one Lagrange multiplier per constraint may no longer be practical. Our proposal is to associate a feature vector with each constraint, and to learn a "multiplier model" that maps each such vector to the corresponding Lagrange multiplier. We prove optimality and feasibility guarantees under assumptions on the flexibility of the multiplier model, and empirically demonstrate that our method is effective on real-world case studies. View details
    Pairwise Fairness for Ranking and Regression
    Andy Cotter
    Maya Gupta
    33rd AAAI Conference on Artificial Intelligence(2020)
    Preview abstract Improving fairness for ranking and regression models has less mature algorithmic tooling than classifiers. Here, we present pairwise formulations of fairness for ranking and regression models that can express analogues of statistical fairness notions like equal opportunity or equal accuracy, as well as statistical parity. The proposed framework supports both discrete protected groups, and continuous protected attributes. We show that the resulting training problems can be efficiently and effectively solved using constrained optimization or robust optimization algorithms. Experiments illustrate the broad applicability and trade-offs of these methods. View details
    Optimizing Black-box Metrics with Adaptive Surrogates
    Qijia Jiang
    Olaoluwa Adigun
    Mahdi Milani Fard
    Maya R. Gupta
    Proceedings of the 37th International Conference on Machine Learning (ICML 2020)
    Preview abstract We address the problem of training models with black-box and hard-to-optimize metrics by expressing the metric as a monotonic function of a small number of easy-to-optimize surrogates. We pose the training problem as an optimization over a relaxed surrogate space, which we solve by estimating local gradients for the metric and performing inexact convex projections. We analyze gradient estimates based on finite differences and local linear interpolations, and show convergence of our approach under smoothness assumptions with respect to the surrogates. Experimental results on classification and ranking problems verify the proposal performs on par with methods that know the mathematical formulation, and adds notable value when the form of the metric is unknown. View details
    Preview abstract We present a consistent algorithm for constrained classification problems where the objective (e.g. F-measure, G-mean) and the constraints (e.g. demographic parity fairness, coverage) are defined by general functions of the confusion matrix. Our approach reduces the problem into a sequence of plug-in classifier learning tasks. The reduction is achieved by posing the learning problem as an optimization over the intersection of two sets: the set of confusion matrices that are achievable and those that are feasible. This decoupling of the constraint space then allows us to solve the problem using a Frank-Wolfe based algorithm. For objective and constraints that are convex functions of the confusion matrix, our algorithm requires $O(1/\epsilon^2)$ calls to the plug-in subroutine, which improves on the $O(1/\epsilon^3)$ rate needed by the reduction based algorithm of Narasimhan (2018). We show empirically that our algorithm is competitive with prior methods. View details