Minghui Shi
Minghui received her Bachelor's Degree in Science from University of Science and Technology of China in July of 2007. Afterwards, she came to Department of Statistical Science at Duke University to pursue a PhD degree. In May of 2011, she received a PhD degree in Statistics. Now, she works as a quantitative analyst at Google. Her major research interests include Bayesian variable selection and factor analysis.
Research Areas
Authored Publications
Sort By
Preview abstract
We describe a Digital Advertising System Simulation (DASS) for modeling advertising and its impact on user behavior. DASS is both flexible and general, and can be applied to research on a wide range of topics, such as digital attribution, ad fatigue, campaign optimization, and marketing mix modeling. This paper introduces the basic DASS simulation framework and illustrates its application to digital attribution. We show that common position-based attribution models fail to capture the true causal effects of advertising across several simple scenarios. These results lay a groundwork for the evaluation of more complex attribution models, and the development of improved models.
View details
Data Enriched Linear Regression
Art Owen
Electronic Journal of Statistics, 9 (2015), pp. 1078-1112 (to appear)
Preview abstract
We present a linear regression method for predictions on a small data set making use of a second possibly biased data set that may be much larger. Our method fits linear regressions to the two data sets while penalizing the difference between predictions made by those two models. The resulting algorithm is a shrinkage method similar to those used in small area estimation. We find a Stein-type result for Gaussian responses: when the model has 5 or more coefficients and 10 or more error degrees of freedom, it becomes inadmissible to use only the small data set, no matter how large the bias is. We also present both plug-in and AICc-based methods to tune our penalty parameter. Most of our results use an L2 penalty, but we obtain formulas for L1 penalized estimates when the model is specialized to the location setting. Ordinary Stein shrinkage provides an inadmissibility result for only 3 or more coefficients, but we find that our shrinkage method typically produces much lower squared errors in as few as 5 or 10 dimensions when the bias is small and essentially equivalent squared errors when the bias is large.
View details
Preview abstract
There is increasing interest in measuring the overlap and/or incremental reach of cross-media campaigns. The direct method is to use a cross-media panel but these are expensive to scale across all media. Typically, the cross-media panel is too small to produce reliable estimates when the interest comes down to subsets of the population. An alternative is to combine information from a small cross-media panel with a larger, cheaper but potentially biased single media panel. In this article, we develop a data enrichment approach specifically for incremental reach estimation. The approach not only integrates information from both panels that takes into account potential panel bias, but borrows strength from modeling conditional dependence of cross-media reaches. We demonstrate the approach with data from six campaigns for estimating YouTube video ad incremental reach over TV. In a simulation directly modeled on the actual data, we find that
data enrichment yields much greater accuracy than one would get by either ignoring the larger panel, or by using it in a data fusion.
View details