Evgeny Skvortsov

Evgeny Skvortsov

Evgeny received a B.Sc. and M.Sc. degrees in Mathematics from Ural State University (Yekaterinburg, Russia) in 2001 and 2003. Afterwards, he came to the School of Computer Science of the Simon Fraser University (Burnaby, Canada) to pursue a PhD degree. He received a PhD in Computer Science in December 2009. His PhD dissertation was on probabilistic analysis of heuristics solving NP-complete problems. At Google Evgeny works on Brand Advertising. His research interests include Machine Intelligence, Statistics, Randomized Distributed Computing and Automatic Reasoning.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Tracking Audience Statistics with HyperLogLog
    Andreas Ulbrich
    Jeffrey Scott Wilhelm
    Josh Bao
    Lawrence Tsang
    Will Bradbury
    Google (2021)
    Preview abstract HyperLogLog is the state of the art nearly optimal algorithm for approximate cardinality estimation. We consider the application of HyperLogLog for building scalable systems for internet audience reach reporting. We present an extension of HyperLogLog that enables tracking additional information about the audience, such as demographic distribution, frequency histogram or fraction of spam. We also give an intuitive explanation of why HyperLogLog works, which we find useful, as intuition of the proof in the original HyperLogLog paper requires a lot of effort to understand. This extension and the intuition are itself generic and are not limited to internet reach reporting. View details
    Privacy-centric Cross-publisher Reach and Frequency Estimation Via Vector of Counts
    Jason Frye
    Jiayu Peng
    Jim Koehler
    Joseph Goodknight Knightbrook
    Laura Book
    Michael Daub
    Scott Schneider
    Sheng Ma
    Xichen Huang
    Ying Liu
    Yunwen Yang
    Preston Lee
    Google Inc. (2021)
    Preview abstract Reach and frequency are two of the most important metrics in advertising management. Ads are distributed to different publishers with a hope to maximize the reach at effective frequency. Reliable cross-publisher reach and frequency measurement is called for, to assess the actual ROI of branding and to improve the budget allocation strategy. However, cross-publisher measurement is non-trial under the strict privacy restriction. This paper introduces the first locally differential private solution in the literature to cross-publisher reach and frequency estimation. The solution consists of a family of algorithms based on a data structure called Vector of Counts (VoC). Complying the standard of differential privacy, the solution prevents attackers from telling if any user is reached or not with enough confidence. The solution enjoys particularly high accuracy for the estimation of two publishers. For more than two publishers, the solution does a careful bias-variance trade-off. It enjoys small variance, at a risk of having bias in the presence of cross-publisher correlation of user activity. View details
    Preview abstract This paper proposes an extension to the actionable reach modeling for cross-media audiences, whereTV audiences are measured via extrapolation from a panel or partial set-top-box data. In essence, a set of TV Virtual People are exclusively associated with, or represented by, a TV panelist q via an HR sketch s, which is an extension to an HLL sketch. We extrapolate q’s TV activity to this set of Virtual People, thus s serves as an input to the system and can be deduplicated with the digital part of the audience via a simple sketch merge.The main contribution of this paper is an efficient method that takes as input Q panelists and P, the union of Virtual People they represent, and assigns an HR sketch to each panelist. The efficiency-accuracy trade-off is controlled by a depth parameterD, and to help decide D in practical systems we provide an upper bound to the performance loss due to a finite depth D. The size of the deep sketch is roughly proportional to the depth. For example, for an error of no more than 1%, we can set D to 9. View details
    Preview abstract We introduce methods for efficient and privacy-safe modeling of reach and the demographic composition of cross-media campaigns. Cross-media campaign traffic is composed of two parts: digital and TV. Digital traffic is estimated based on event-level data available in server logs. TV traffic is extrapolated from a combination of panel and set-top-box or smart-TV data. The Virtual-People methodology introduced in paper "Virtual People: Actionable Reach Modeling" allows for efficient measurement of digital audiences. In this paper we extend this methodology to work with the extrapolated data sources associated with TV data, thus generalizing it for cross-media measurement. View details
    Preview abstract In this paper we introduce a new family of methods for cardinality and frequency estimation. These methods combine aspects of HyperLogLog (HLL) and Bloom filters in order to build a sketch that, like HLL, is substantially more compact than a Bloom filter, but like a Bloom filter maintains the ability to union sketches with a bucket-wise sum. Together these properties enable the creation of a scalable secure multi-party computation protocol that takes advantage of homomorphic encryption to combine sketches across multiple untrusted parties. The protocol limits the amount of information that participants learn to differentially private estimates of the union of sketches and some partial information about the Venn diagram of the per-sketch cardinalities. View details
    Preview abstract We introduce a method for serving models that estimate reach and demographics of cross-device online audiences. The method assigns virtual people identifiers to events. The reach of a set of events is estimated as a simple count of distinct virtual people assigned to these events. This allows efficient serving of reach models at large scale. We formalize what it means for a reach model to be actionable and prove that any actionable reach model is equivalent to some virtual people model. We present algorithms for encoding reach models with virtual people and show that a wide variety of modeling techniques can be implemented with this approach. View details
    Measuring Cross-Device Online Audiences
    Jim Koehler
    Sheng Ma
    Song Liu
    Google, Inc. (2016), pp. 1-33 (to appear)
    Preview abstract We extend the work of Koehler, Skvortsov, and Vos (2013) to measure cross-device online audiences. The method performs demographic corrections in the usual way device-by-device. A new method that converts cross-device cookie counts to user counts is introduced. We provide practical recipes for fitting this transformation function and then demonstrate its use using online panel data from Japan. View details
    A Method for Measuring Online Audiences
    Jim Koehler
    Google Inc (2013), pp. 1-24 (to appear)
    Preview abstract We present a method for measuring the reach and frequency of online ad campaigns by audience attributes. This method uses a combination of data sources, including ad server logs, publisher provided user data (PPD), census data, and a representative online panel. It adjusts for known problems with cookie data and potential non-representative and inaccurate PPD. It generalizes for multiple publishers and for targeting based on the PPD. The method includes the conversion of adjusted cookie counts to unique audience counts. The benefit of our method is that we get both reduced variance from server logs and reduced bias from the panel. Simulation results and a case study are presented. View details