Jean Pouget-Abadie
Jean Pouget-Abadie is a research scientist at Google NYC on the Algorithms and Optimization team led by Vahab Mirrokni. He holds a PhD in Computer Science from Harvard University, advised by Edoardo Airoldi and Salil Vadhan. Prior to that, he was an undergraduate at Ecole Polytechnique, Paris. His recent research interests include algorithms and statistics, with a particular focus on causal inference. More information can be found at his personal homepage.
Authored Publications
Sort By
Preview abstract
The conclusions of randomized controlled trials may be biased when the outcome of one unit depends on the treatment status of other units, a problem known as interference. In this work, we study interference in the setting of one-sided bipartite experiments in which the experimental units---where treatments are randomized and outcomes are measured---do not interact directly. Instead, their interactions are mediated through their connections to interference units on the other side of the graph. Examples of this type of interference are common in marketplaces and two-sided platforms. The cluster-randomized design is a popular method to mitigate interference when the graph is known, but it has not been well-studied in the one-sided bipartite experiment setting. In this work, we formalize a natural model for interference in one-sided bipartite experiments using the exposure mapping framework. We first exhibit settings under which existing cluster-randomized designs fail to properly mitigate interference under this model. We then show that minimizing the bias of the difference-in-means estimator under our model results in a balanced partitioning clustering objective with a natural interpretation. We further prove that our design is minimax optimal over the class of linear potential outcomes models with bounded interference. We conclude by providing theoretical and experimental evidence of the robustness of our design to a variety of interference graphs and potential outcomes models.
View details
Design and analysis of bipartite experiments under a linear exposure-response model
Christopher Harshaw
Fredrik Savje
Proceedings of the 23rd ACM Conference on Economics and Computation (2022), pp. 606
Preview abstract
A bipartite experiment consists of one set of units being assigned treatments and another set of units for whichwe measure outcomes. The two sets of units are connected by a bipartite graph, governing how the treatedunits can affect the outcome units. In this paper, we consider estimation of the average total treatment effectin the bipartite experimental framework under a linear exposure-response model. We introduce the ExposureReweighted Linear (ERL) estimator, and show that the estimator is unbiased, consistent and asymptoticallynormal, provided that the bipartite graph is sufficiently sparse. To facilitate inference, we introduce an unbiasedand consistent estimator of the variance of theERLpoint estimator. In addition, we introduce a cluster-baseddesign,Exposure-Design, that uses heuristics to increase the precision of theERLestimator by realizinga desirable exposure distribution. Finally, we demonstrate the application of the described methodology tomarketplace experiments using a publicly available Amazon user-item review dataset.
View details
Preview abstract
The conclusions of randomized controlled trials may be biased when the outcome of one unit depends on the treatment status of other units, a problem known as interference. In this work, we study interference in the setting of one-sided bipartite experiments in which the experimental units---where treatments are randomized and outcomes are measured---do not interact directly. Instead, their interactions are mediated through their connections to interference units on the other side of the graph. Examples of this type of interference are common in marketplaces and two-sided platforms. The cluster-randomized design is a popular method to mitigate interference when the graph is known, but it has not been well-studied in the one-sided bipartite experiment setting. In this work, we formalize a natural model for interference in one-sided bipartite experiments using the exposure mapping framework. We first exhibit settings under which existing cluster-randomized designs fail to properly mitigate interference under this model. We then show that minimizing the bias of the difference-in-means estimator under our model results in a balanced partitioning clustering objective with a natural interpretation. We further prove that our design is minimax optimal over the class of linear potential outcomes models with bounded interference. We conclude by providing theoretical and experimental evidence of the robustness of our design to a variety of interference graphs and potential outcomes models.
View details
Synthetic Design: An Optimization Approach to Experimental Design with Synthetic Controls
Guido Imbens
Jann Spiess
Khashayar Khosravi
Miles Lubin
Nick Doudchenko
35th Conference on Neural Information Processing Systems (NeurIPS 2021) (2021)
Preview abstract
We investigate the optimal design of experimental studies that have pre-treatment outcome data available. The average treatment effect is estimated as the difference between the weighted average outcomes of the treated and control units. A number of commonly used approaches fit this formulation, including the difference-in-means estimator and a variety of synthetic-control techniques. We propose several methods for choosing the set of treated units in conjunction with the weights. Observing the NP-hardness of the problem, we introduce a mixed-integer programming formulation which selects both the treatment and control sets and unit weightings. We prove that these proposed approaches lead to qualitatively different experimental units being selected for treatment. We use simulations based on publicly available data from the US Bureau of Labor Statistics that show improvements in terms of mean squared error and statistical power when compared to simple and commonly used alternatives such as randomized trials.
View details
Variance Reduction in Bipartite Experiments through Correlation Clustering
Warren Schudy
Thirty-third Conference on Neural Information Processing Systems (2019) (to appear)
Preview abstract
Causal inference in randomized experiments typically assumes that the units of randomization and the units of analysis are one and the same. In some applications, however, these two roles are played by distinct entities linked by a bipartite graph. The key challenge in such bipartite settings is how to avoid interference bias, which would typically arise if we simply randomized the treatment at the level of analysis units. One effective way of minimizing interference bias in standard experiments is through cluster randomization, but this design has not been studied in the bipartite setting where conventional clustering schemes can lead to poorly powered experiments. This paper introduces a novel clustering objective and a corresponding algorithm that partitions a bipartite graph so as to maximize the statistical power of a bipartite experiment on that graph. Whereas previous work relied on balanced partitioning, our formulation suggests the use of a correlation clustering objective. We use a publicly-available graph of Amazon user-item reviews to validate our solution and illustrate how it substantially increases the statistical power in bipartite experiments.
View details
Randomized Experimental Design via Geographic Clustering
David Rolnick
Amir Najmi
Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2019)
Preview abstract
Web-based services often run randomized experiments to improve their products. A popular way to run these experiments is to use geographical regions as units of experimentation, since this does not require tracking of individual users or browser cookies. Since users may issue queries from multiple geographical locations, georegions cannot be considered independent and interference may be present in the experiment. In this paper, we study this problem, and first present GeoCUTS, a novel algorithm that forms geographical clusters to minimize interference while preserving balance in cluster size. We use a random sample of anonymized traffic from Google Search to form a graph representing user movements, then construct a geographically coherent clustering of the graph. Our main technical contribution is a statistical framework to measure the effectiveness of clusterings. Furthermore, we perform empirical evaluations showing that the performance of GeoCUTS is comparable to hand-crafted geo-regions with respect to both novel and existing metrics.
View details