Yuxue Jin

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Evaluating the return on ad spend (ROAS), the causal effect of advertising on sales, is critical to advertisers for understanding the performance of their existing marketing strategy as well as how to improve and optimize it. Media Mix Modeling (MMM) has been used as a convenient analytical tool to address the problem using observational data. However it is well recognized that MMM suffers from various fundamental challenges: data collection, model specification and selection bias due to ad targeting, among others (Chan & Perry 2017; Wolfe 2016). In this paper, we study the challenge associated with measuring the impact of search ads in MMM, namely the selection bias due to ad targeting. Using causal diagrams of the search ad environment, we derive a statistically principled method for bias correction based on the back-door criterion (Pearl 2013). We use case studies to show that the method provides promising results by comparison with results from randomized experiments. We also report a more complex case study where the advertiser had spent on more than a dozen media channels but results from a randomized experiment are not available. Both our theory and empirical studies suggest that in some common, practical scenarios, one may be able to obtain an approximately unbiased estimate of search ad ROAS. View details
    Bayesian Methods for Media Mix Modeling with Carryover and Shape Effects
    Jim Koehler
    research.google.com, Google Inc., 76 Ninth Avenue Google New York NY 10011(2017)
    Preview abstract Media mix models are used by advertisers to measure the effectiveness of their advertising and provide insight in making future budget allocation decisions. Advertising usually has lag effects and diminishing returns, which are hard to capture using linear regression. In this paper, we propose a media mix model with flexible functional forms to model the carryover and shape effects of advertising. The model is estimated using a Bayesian approach in order to make use of prior knowledge accumulated in previous or related media mix models. We illustrate how to calculate attribution metrics such as ROAS and mROAS from posterior samples on simulated data sets. Simulation studies show that the model can be estimated very well for large size data sets, but prior distributions have a big impact on the posteriors when the sample size is small and may lead to biased estimates. We apply the model to data from a shampoo advertiser, and use Bayesian Information Criterion (BIC) to choose the appropriate specification of the functional forms for the carryover and shape effects. We further illustrate that the optimal media mix based on the model has a large variance due to the variance of the parameter estimates. View details
    Preview abstract One of the major problems in developing media mix models is that the data that is generally available to the modeler lacks sufficient quantity and information content to reliably estimate the parameters in a model of even moderate complexity. Pooling data from different brands within the same product category provides more observations and greater variability in media spend patterns. We either directly use the results from a hierarchical Bayesian model built on the category dataset, or pass the information learned from the category model to a brand-specific media mix model via informative priors within a Bayesian framework, depending on the data sharing restriction across brands. We demonstrate using both simulation and real case studies that our category analysis can improve parameter estimation and reduce uncertainty of model prediction and extrapolation. View details
    Preview abstract Media mix modeling is a statistical analysis on historical data to measure the return on investment (ROI) on advertising and other marketing activities. Current practice usually utilizes data aggregated at a national level, which often suffers from small sample size and insufficient variation in the media spend. When sub-national data is available, we propose a geo-level Bayesian hierarchical media mix model (GBHMMM), and demonstrate that the method generally provides estimates with tighter credible intervals compared to a model with national level data alone. This reduction in error is due to having more observations and useful variability in media spend, which can protect advertisers from unsound reallocation decisions. Under some weak conditions, the geo-level model can reduce the ad targeting bias. When geo-level data is not available for all the media channels, the geo-level model estimates generally deteriorate as more media variables are imputed using the national level data View details
    Preview abstract Many socio-economic studies rely on panel data as they also provide detailed demographic information about consumers. For example, advertisers use TV and web metering panels to estimate ads effectiveness in selected target demographics. However, panels often record only a fraction of all events due to non-registered devices, technical problems, or work usage. Goerg et al. (2015) present a beta-binomial negative-binomial hurdle (BBNBH) model to impute missing events in count data with excess zeros. In this work, we study empirical properties of the MLE for the BBNBH model, extend it to categorical covariates, introduce a penalized maximum likelihood estimator (MLE) to get accurate estimates by demographic group, and apply the methodology to a German media panel to learn about demographic patterns in the YouTube viewership. View details
    How Many People Visit YouTube? Imputing Missing Events in Panels With Excess Zeros
    Georg M. Goerg
    Nicolas Remy
    Jim Koehler
    ; SAGE Publications - edited by Herwig Friedl and Helga Wagner, Linz, Austria(2015), pp. 1-6
    Preview abstract Media-metering panels track TV and online usage of people to analyze viewing behavior. However, panel data is often incomplete due to non-registered devices, non-compliant panelists, or work usage. We thus propose a probabilistic model to impute missing events in data with excess zeros using a negative-binomial hurdle model for the unobserved events and beta-binomial sub-sampling to account for missingness. We then use the presented models to estimate the number of people in Germany who visit YouTube. View details
    The Optimal Mix of TV and Online Ads to Maximize Reach
    Jim Koehler
    Georg M. Goerg
    Nicolas Remy
    research.google.com, 76 Ninth Avenue(2013), pp. 1-16
    Preview abstract Brand marketers often wonder how they should allocate budget between TV and online ads in order to maximize reach or maintain the same reach at a lower cost. We use probability models based on historical cross media panel data to suggest the optimal budget allocation between TV and online ads to maximize reach to the target demographics. We take a historical TV campaign and estimate the reach and GRPs of a hypothetical cross-media campaign if some budget was shifted from TV to online. The models are validated against simulations and historical cross-media campaigns. They are illustrated on one case study to show how an optimized cross-media campaign can obtain a higher reach at the same cost or maintain the same reach at a lower cost than the TV-only campaign. View details
    The Incremental Reach and Cost Efficiency of Online Video Ads over TV Ads
    Sheethal Shobowale
    Jim Koehler
    Harry Case
    Google Inc(2012), pp. 1-17
    Preview abstract As people spend more time online, an increasing number of brand marketers are including video ads in their advertising campaigns. These advertisers would like to know the incremental reach and cost efficiency of their video and display ads compared to their TV ads. In this paper, we measure the incremental reach to a target demographic and estimate the cost per incremental reach point of YouTube (YT) and the Google Display Network (GDN) compared to TV ad campaigns. We consider two media planning scenarios: what it would have cost for the TV ad campaign to have delivered the equivalent of the online incremental reach, and what saving could have been achieved by having spent less on TV ads and complementing them with online ads for a given reach goal. View details