Harikrishna Narasimhan
I am a Senior Research Scientist at Google Research in Mountain View, USA. Prior to joining Google, I was a post-doctoral researcher at Harvard University. I completed my PhD from the Indian Institute of Science, Bangalore. My research interests broadly lie in the areas of machine learning and learning theory, with focus on complex evaluation metrics, constrained optimization and algorithmic fairness.
Please see my personal web page for my full list of publications.
Research Areas
Authored Publications
Sort By
Preview abstract
Large language models (LLMs) have shown strong results on a range of applications,
including regression and scoring tasks. Typically, one obtains outputs from an LLM via autoregressive sampling from the model’s output distribution. We show that this inference
strategy can be sub-optimal for common regression and scoring evaluation metrics. As a
remedy, we build on prior work on Minimum Bayes Risk decoding, and propose alternate
inference strategies that estimate the Bayes-optimal solution for regression and scoring metrics in closed-form from sampled responses. We show that our proposal significantly improves over baselines across datasets and models.
View details
Preview abstract
The generalization ability of machine learning models degrades significantly when the test distribution shifts away from the training distribution. We investigate the problem of training models that are robust to shifts caused by changes in the distribution of class-priors or group-priors. The presence of skewed training priors can often lead to the models overfitting to spurious features. Unlike existing methods, which optimize for either the worst or the average performance over classes or groups, our work is motivated by the need for finer control over the robustness properties of the model. We present a lightweight post-hoc approach that performs scaling adjustments to predictions from a fixed pre-trained model, with the goal of minimizing a distributionally robust loss around a chosen target distribution. These adjustments are computed by solving a constrained optimization problem on a validation set and applied during test time. We propose a novel evaluation metric to test the models on a spectrum of controlled distribution shifts. While being extremely lightweight and fast, our method comes with provable guarantees and compares favorably with the existing more complex methods for this problem on standard class imbalance and group distributional robustness benchmarks.
View details
Post-hoc estimators for learning to defer to an expert
Aditya Krishna Menon
Advances in Neural Information Processing Systems (2022)
Preview abstract
Many practical settings allow a learner to defer predictions to one or more costly experts. For example, the learning to defer paradigm allows a learner to defer to a human expert, at some monetary cost. Similarly, the adaptive inference paradigm allows a base model to defer to one or more large models, at some computational cost. The goal in these settings is to learn classification and deferral mechanisms to optimise a suitable accuracy-cost tradeoff. To achieve this, a central issue studied in prior work is the design of a coherent loss function for both mechanisms. In this work, we demonstrate that existing losses have two subtle limitations: they can encourage underfitting when there is a high cost of deferring, and the deferral function can have a weak dependence on the base model predictions. To resolve these issues, we propose a post-hoc training scheme: we train a deferral function on top of a base model, with the objective of predicting to defer when the base model's error probability exceeds the cost of the expert model. This may be viewed as applying a partial surrogate to the ideal deferral loss, which can lead to a tighter approximation and thus better performance. Empirically, we verify the efficacy of post-hoc training on benchmarks for learning to defer and adaptive inference.
View details
Preview abstract
In real-world systems, models are frequently updated as more data becomes available, and in addition to achieving high accuracy, the goal is to also maintain a low difference in predictions compared to the base model (i.e. predictive churn). If model retraining results in vastly different behavior, then it could cause negative effects in downstream systems, especially if this churn can be avoided with limited impact on model accuracy. In this paper, we show an equivalence between training with distillation using the base model as the teacher and training with an explicit constraint on the predictive churn. We then show that distillation performs strongly for low churn training against a number of recent baselines on a wide range of datasets and model architectures, including fully-connected networks, convolutional networks, and transformers.
View details
Optimal Auctions through Deep Learning
Paul Duetting
Zhe Feng
David C. Parkes
Sai Srivatsa Ravindranath
Communications of the ACM, 64 (Issue 8) (2021), pp. 109-116
Preview abstract
Designing an incentive compatible auction that maximizes expected revenue is an intricate task. The single-item case was resolved in a seminal piece of work by Myerson in 1981. Even after 30-40 years of intense research the problem remains unsolved for settings with two or more items. We overview recent research results that show how tools from deep learning are shaping up to be become a powerful tool for the automated design of optimal auctions. In this approach, an auction is modeled as a multi-layer neural network, with optimal auction design framed as a constrained learning problem, and solved through standard machine learning pipelines. Through this approach, it is possible to recover essentially all known analytical solutions for multi-item settings, and obtain novel mechanisms for settings in which the optimal mechanism is unknown.
View details
Implicit rate-constrained optimization of non-decomposable objectives
Abhishek Kumar
Andy Cotter
Proceedings of the 38th International Conference on Machine Learning (ICML), 2021
Preview abstract
We consider a popular family of constrained optimization problems arising in machine learning that involve optimizing a non-decomposable evaluation metric with a certain thresholded form, while constraining another metric of interest. Examples of such problems include optimizing the false negative rate at a fixed false positive rate, optimizing precision at a fixed recall, optimizing the area under the precision-recall or ROC curves, etc. Our key idea is to formulate a rate-constrained optimization that expresses the threshold parameter as a function of the model parameters via the Implicit Function theorem. We show how the resulting optimization problem can be solved using standard gradient based methods. Experiments on benchmark datasets demonstrate the effectiveness of our proposed method over existing state-of-the-art approaches for these problems. The code for the proposed method is available at this url: https://github.com/google-research/google-research/tree/master/implicit_constrained_optimization.
View details
Optimizing Blackbox Metrics with Iterative Example Weighting
Gaurush Hiranandani
Jatin Mathur
Mahdi Milani Fard
Sanmi Koyejo
Proceedings of the 38th International Conference on Machine Learning (ICML), 2021
Preview abstract
We consider learning to optimize a classification metric defined by a black-box function of the confusion matrix. Such black-box learning settings are ubiquitous, for example, when the learner only has query access to the metric of interest, or in noisy-label and domain adaptation applications where the learner must evaluate the metric via performance evaluation using a small validation sample. Our approach is to adaptively learn example weights on the training dataset such that the resulting weighted objective best approximates the metric on the validation sample. We show how to model and estimate the example weights and use them to iteratively post-shift a pre-trained class probability estimator to construct a classifier and analyze the resulting procedure's statistical properties. Experiments on various label noise, domain shift, and fair classification setups confirm that our proposal is better than the individual state-of-the-art baselines for each application.
View details
Optimizing Black-box Metrics with Adaptive Surrogates
Qijia Jiang
Olaoluwa Adigun
Mahdi Milani Fard
Maya R. Gupta
Proceedings of the 37th International Conference on Machine Learning (ICML 2020)
Preview abstract
We address the problem of training models with black-box and hard-to-optimize metrics by expressing the metric as a monotonic function of a small number of easy-to-optimize surrogates. We pose the training problem as an optimization over a relaxed surrogate space, which we solve by estimating local gradients for the metric and performing inexact convex projections. We analyze gradient estimates based on finite differences and local linear interpolations, and show convergence of our approach under smoothness assumptions with respect to the surrogates. Experimental results on classification and ranking problems verify the proposal performs on par with methods that know the mathematical formulation, and adds notable value when the form of the metric is unknown.
View details
Pairwise Fairness for Ranking and Regression
Andy Cotter
Maya Gupta
33rd AAAI Conference on Artificial Intelligence (2020)
Preview abstract
Improving fairness for ranking and regression models has less mature algorithmic tooling than classifiers. Here, we present pairwise formulations of fairness for ranking and regression models that can express analogues of statistical fairness notions like equal opportunity or equal accuracy, as well as statistical parity. The proposed framework supports both discrete protected groups, and continuous protected attributes. We show that the resulting training problems can be efficiently and effectively solved using constrained optimization or robust optimization algorithms. Experiments illustrate the broad applicability and trade-offs of these methods.
View details
Preview abstract
In machine learning applications such as ranking fairness or fairness over intersectional groups, one often encounters optimization problems with an extremely large number of constraints. In particular, with ranking fairness tasks, there may even be a variable number of constraints, e.g. one for each query in the training set. In these cases, the standard approach of optimizing a Lagrangian while maintaining one Lagrange multiplier per constraint may no longer be practical. Our proposal is to associate a feature vector with each constraint, and to learn a "multiplier model" that maps each such vector to the corresponding Lagrange multiplier. We prove optimality and feasibility guarantees under assumptions on the flexibility of the multiplier model, and empirically demonstrate that our method is effective on real-world case studies.
View details