# Karthikeyan Shanmugam

I am a Research Scientist at Google Research India (Bengaluru). I am part of the Machine Learning Foundations and Optimization Team.
Previously, I was a Research Staff Member with the IBM Research AI, NY during the period 2017-2022 and a Herman Goldstine Postdoctoral Fellow at IBM Research, NY in the period 2016-2017. I obtained my Ph.D. in ECE from UT Austin in 2016. My advisor at UT was Alex Dimakis. I obtained my MS degree in Electrical Engineering (2010-2012) from the University of Southern California, B.Tech and M.Tech degrees in Electrical Engineering from IIT Madras in 2010.
My research interests broadly lie in Graph algorithms, Machine learning, Optimization, Coding Theory and Information Theory. Specifically in machine learning, my recent focus is on Causal Inference, Bandits/RL and Explainable AI. Please visit my personal webpage for more details.

Authored Publications

Google Publications

Other Publications

Sort By

Learning from Label Proportions: Bootstrapping Supervised Learners via Belief Propagation

The Twelfth International Conference on Learning Representations (ICLR) (2024)

Preview abstract
Learning from Label Proportions (LLP) is a learning problem where only aggregate level labels are available for groups of instances, called bags, during training, and the aim is to get the best performance at the instance-level on the test data. This setting arises in domains like advertising and medicine due to privacy considerations. We propose a novel algorithmic framework for this problem that iteratively performs two main steps. For the first step (Pseudo Labeling) in every iteration, we define a Gibbs distribution over binary instance labels that incorporates a) covariate information through the constraint that instances with similar covariates should have similar labels and b) the bag level aggregated label. We then use Belief Propagation (BP) to marginalize the Gibbs distribution to obtain pseudo labels. In the second step (Embedding Refinement), we use the pseudo labels to provide supervision for a learner that yields a better embedding. Further, we iterate on the two steps again by using the second step's embeddings as new covariates for the next iteration. In the final iteration, a classifier is trained using the pseudo labels. Our algorithm displays strong gains against several SOTA baselines for the LLP Binary Classification problem on various dataset types - Small Tabular, Large Tabular and Images. We achieve these improvements with minimal computational overhead above standard supervised learning due to Belief Propagation, for large bag sizes, even for a million samples.
View details

General Identifiability and Achievability for Causal Representation Learning

Burak Varici

Emre Acarturk

Ali Tajer

AISTATS 2024 (Oral), Oral Talk at NeurIPS Causal Representation Learning Workshop 2023. (2024)

Preview abstract
This paper focuses on causal representation learning (CRL) under a general nonparametric latent causal model and a general transformation model that maps the latent data to the observational data. It establishes identifiability and achievability results using two hard uncoupled interventions per node in the latent causal graph. Notably, one does
not know which pair of intervention environments have the same node intervened (hence,
uncoupled). For identifiability, the paper establishes that perfect recovery of the latent
causal model and variables is guaranteed under uncoupled interventions. For achievability,
an algorithm is designed that uses observational and interventional data and recovers
the latent causal model and variables with provable guarantees. This algorithm leverages
score variations across different environments to estimate the inverse of the transformer and,
subsequently, the latent variables. The analysis, additionally, recovers the identifiability
result for two hard coupled interventions, that is when metadata about the pair of environments that have the same node intervened is known. This paper also shows that when observational data is available, additional faithfulness assumptions that are adopted by the existing literature are unnecessary
View details

Fairness under Covariate Shift: Improving Fairness-Accuracy tradeoff with few Unlabeled Test Samples

Jatin Chauhan

Jay Nandy

The 38th Annual AAAI Conference on Artificial Intelligence (2024)

Preview abstract
Covariate shift in the test data is a common practical phenomena that can significantly downgrade both the accuracy and the fairness performance of the model. Ensuring fairness across different sensitive groups under covariate shift is of paramount importance due to societal implications like criminal justice. We operate in the unsupervised regime where only a small set of unlabeled test samples along with a labeled training set is available. Towards improving fairness under this highly challenging yet realistic scenario, we make three contributions. First is a novel composite weighted entropy based objective for prediction accuracy which is optimized along with a representation matching loss for fairness. We experimentally verify that optimizing with our loss formulation outperforms a number of state-of-the-art baselines in the pareto sense with respect to the fairness-accuracy tradeoff on several standard datasets. Our second contribution is a new setting we term Asymmetric Covariate Shift that, to the best of our knowledge, has not been studied before. Asymmetric covariate shift occurs when distribution of covariates of one group shifts significantly compared to the other groups and this happens when a dominant group is over-represented. While this setting is extremely challenging for current baselines, We show that our proposed method significantly outperforms them. Our third contribution is theoretical, where we show that our weighted entropy term along with prediction loss on the training set approximates test loss under covariate shift. Empirically and through formal sample complexity bounds, we show that this approximation to the unseen test loss does not depend on importance sampling variance which affects many other baselines.
View details

Front-door Adjustment Beyond Markov Equivalence with Limited Graph Knowledge

Abhin Shah

Murat Kocaoglu

Neural Information Processing Systems 2023 (NeurIPS 2023) (2023) (to appear)

Preview abstract
Causal effect estimation from data typically requires assumptions about the cause-effect relations either explicitly in the form of a causal graph structure within the Pearlian framework, or implicitly in terms of (conditional) independence statements between counterfactual variables within the potential outcomes framework. When the treatment variable and the outcome variable are confounded, front-door adjustment is an important special case where, given the graph, causal effect of the treatment on the target can be estimated using post-treatment variables. However, the exact formula for front-door adjustment depends on the structure of the graph, which is difficult to learn in practice. In this work, we provide testable conditional independence statements to compute the causal effect using front-door-like adjustment without knowing the graph under limited structural side information. We show that our method is applicable in scenarios where knowing the Markov equivalence class is not sufficient for causal effect estimation. We demonstrate the effectiveness of our method on a class of random graphs as well as real causal fairness benchmarks.
View details

Causal Bandits for Linear Structural Equation Models

Ali Tajer

Burak Varici

Prasanna Sattigeri

Journal of Machine Learning Research (2023) (to appear)

Preview abstract
This paper studies the problem of designing an optimal sequence of interventions in a causal graphical model to minimize cumulative regret with respect to the best intervention in hindsight. This is, naturally, posed as a causal bandit problem. The focus is on causal bandits for linear structural equation models (SEMs) and soft interventions. It is assumed that the graph's structure is known and has N nodes. Two linear mechanisms, one soft intervention and one observational, are assumed for each node, giving rise to 2N possible interventions. Majority of the existing causal bandit algorithms assume that at least the interventional distributions of the reward node's parents are fully specified. However, there are 2N such distributions (one corresponding to each intervention), acquiring which becomes prohibitive even in moderate-sized graphs. This paper dispenses with the assumption of knowing these distributions or their marginals. Two algorithms are proposed for the frequentist (UCB-based) and Bayesian (Thompson Sampling-based) settings. The key idea of these algorithms is to avoid directly estimating the 2N reward distributions and instead estimate the parameters that fully specify the SEMs (linear in N) and use them to compute the rewards. In both algorithms, under boundedness assumptions on noise and the parameter space, the cumulative regrets scale as $d^{L+1/2} \sqrt{NT}$ where d is the graph's maximum degree, and L is the length of its longest causal path. Additionally, a minimax lower of $\Omega(d^{L/2-1/2} \sqrt{NT})$ is presented, which suggests that the achievable and lower bounds conform in their scaling behavior with respect to the horizon T and graph parameters d and L.
View details

No Results Found