Sangho Yoon

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract In this article, we discuss an approach to the design of experiments in a network. In particular, we describe a method to prevent potential contamination (or inconsistent treatment exposure) of samples due to network effects. We present data from Google Cloud Platform (GCP) as an example of how we use A/B testing when users are connected. Our methodology can be extended to other areas where the network is observed and when avoiding contamination is of primary concern in experiment design. We first describe the unique challenges in designing experiments on developers working on GCP. We then use simulation to show how proper selection of the randomization unit can avoid estimation bias. This simulation is based on the actual user network of GCP. View details
    Preview abstract In this paper we try to define insulin resistance (IR) precisely for a group of Chinese women. Our definition deliberately does not depend upon body mass index (BMI) or age, although in other studies, with particular random effects models quite different from models used here, BMI accounts for a large part of the variability in IR. We accomplish our goal through application of Gauss mixture vector quantization (GMVQ), a technique for clustering that was developed for application to lossy data compression. Defining data come from measurements that play major roles in medical practice. A precise statement of what the data are is in Section 1. Their family structures are described in detail. They concern levels of lipids and the results of an oral glucose tolerance test (OGTT). We apply GMVQ to residuals obtained from regressions of outcomes of an OGTT and lipids on functions of age and BMI that are inferred from the data. A bootstrap procedure developed for our family data supplemented by insights from other approaches leads us to believe that two clusters are appropriate for defining IR precisely. One cluster consists of women who are IR, and the other of women who seem not to be. Genes and other features are used to predict cluster membership. We argue that prediction with ‘‘main effects’’ is not satisfactory, but prediction that includes interactions may be. View details
    Prediction of Advertiser Churn for Google AdWords
    Jim Koehler
    Adam Ghobarah
    JSM Proceedings, American Statistical Association (2010) (to appear)
    Preview abstract Google AdWords has thousands of advertisers participating in auctions to show their advertisements. Google's business model has two goals: first, provide relevant information to users and second, provide advertising opportunities to advertisers to achieve their business needs. To better serve these two parties, it is important to find relevant information for users and at the same time assist advertisers in advertising more efficiently and effectively. In this paper, we try to tackle this problem of better connecting users and advertisers from a customer relationship management point of view. More specifically, we try to retain more advertisers in AdWords by identifying and helping advertisers that are not successful in using Google AdWords. In this work, we first propose a new definition of advertiser churn for AdWords advertisers; second we present a method to carefully select a homogeneous group of advertisers to use in understanding and predicting advertiser churn; and third we build a model to predict advertiser churn using machine learning algorithms. View details