Evgeny Skvortsov
Evgeny received a B.Sc. and M.Sc. degrees in Mathematics from Ural State University (Yekaterinburg, Russia) in 2001 and 2003. Afterwards, he came to the School of Computer Science of the Simon Fraser University (Burnaby, Canada) to pursue a PhD degree. He received a PhD in Computer Science in December 2009. His PhD dissertation was on probabilistic analysis of heuristics solving NP-complete problems. At Google Evgeny works on Brand Advertising. His research interests include Machine Intelligence, Statistics, Randomized Distributed Computing and Automatic Reasoning.
Authored Publications
Sort By
Tracking Audience Statistics with HyperLogLog
Andreas Ulbrich
Jeffrey Scott Wilhelm
Josh Bao
Lawrence Tsang
Will Bradbury
Google (2021)
Preview abstract
HyperLogLog is the state of the art nearly optimal algorithm for approximate
cardinality estimation. We consider the application of HyperLogLog for building
scalable systems for internet audience reach reporting.
We present an extension of HyperLogLog that enables tracking additional
information about the audience, such as demographic distribution, frequency
histogram or fraction of spam. We also give an intuitive explanation of why
HyperLogLog works, which we find useful, as intuition of the proof in the
original HyperLogLog paper requires a lot of effort to understand. This
extension and the intuition are itself generic and are not limited to internet
reach reporting.
View details
Privacy-centric Cross-publisher Reach and Frequency Estimation Via Vector of Counts
Jason Frye
Jiayu Peng
Jim Koehler
Joseph Goodknight Knightbrook
Laura Book
Michael Daub
Scott Schneider
Sheng Ma
Xichen Huang
Ying Liu
Yunwen Yang
Preston Lee
Google Inc. (2021)
Preview abstract
Reach and frequency are two of the most important metrics in advertising management. Ads are distributed to different publishers with a hope to maximize the reach at effective frequency. Reliable cross-publisher reach and frequency measurement is called for, to assess the actual ROI of branding and to improve the budget allocation strategy. However, cross-publisher measurement is non-trial under the strict privacy restriction.
This paper introduces the first locally differential private solution in the literature to cross-publisher reach and frequency estimation. The solution consists of a family of algorithms based on a data structure called Vector of Counts (VoC). Complying the standard of differential privacy, the solution prevents attackers from telling if any user is reached or not with enough confidence. The solution enjoys particularly high accuracy for the estimation of two publishers. For more than two publishers, the solution does a careful bias-variance trade-off. It enjoys small variance, at a risk of having bias in the presence of cross-publisher correlation of user activity.
View details
Preview abstract
This paper proposes an extension to the actionable reach modeling for cross-media audiences, whereTV audiences are measured via extrapolation from a panel or partial set-top-box data. In essence, a set of TV Virtual People are exclusively associated with, or represented by, a TV panelist q via an HR sketch s, which is an extension to an HLL sketch. We extrapolate q’s TV activity to this set of Virtual People, thus s serves as an input to the system and can be deduplicated with the digital part of the audience via a simple sketch merge.The main contribution of this paper is an efficient method that takes as input Q panelists and P, the union of Virtual People they represent, and assigns an HR sketch to each panelist. The efficiency-accuracy trade-off is controlled by a depth parameterD, and to help decide D in practical systems we provide an upper bound to the performance loss due to a finite depth D. The size of the deep sketch is roughly proportional to the depth. For example, for an error of no more than 1%, we can set D to 9.
View details
Preview abstract
We introduce methods for efficient and privacy-safe modeling of reach and the demographic composition of cross-media campaigns. Cross-media campaign traffic is composed of two parts: digital
and TV. Digital traffic is estimated based on event-level data available in server logs. TV traffic is
extrapolated from a combination of panel and set-top-box or smart-TV data. The Virtual-People
methodology introduced in paper "Virtual People: Actionable Reach Modeling" allows for efficient measurement of digital audiences. In this paper
we extend this methodology to work with the extrapolated data sources associated with TV data,
thus generalizing it for cross-media measurement.
View details
Privacy-Preserving Secure Cardinality and Frequency Estimation
Benjamin Kreuter
Raimundo Mirisola
Yao Wang
Google, LLC (2020)
Preview abstract
In this paper we introduce a new family of methods for cardinality and
frequency estimation. These methods combine aspects of HyperLogLog
(HLL) and Bloom filters in order to build a sketch that, like HLL, is
substantially more compact than a Bloom filter, but like a Bloom filter
maintains the ability to union sketches with a bucket-wise sum. Together
these properties enable the creation of a scalable secure multi-party computation protocol that takes advantage of homomorphic encryption to
combine sketches across multiple untrusted parties. The protocol limits
the amount of information that participants learn to differentially private
estimates of the union of sketches and some partial information about the
Venn diagram of the per-sketch cardinalities.
View details
Preview abstract
We introduce a method for serving models that estimate reach and demographics of cross-device
online audiences. The method assigns virtual people identifiers to events. The reach of a set of
events is estimated as a simple count of distinct virtual people assigned to these events. This
allows efficient serving of reach models at large scale. We formalize what it means for a reach
model to be actionable and prove that any actionable reach model is equivalent to some virtual
people model. We present algorithms for encoding reach models with virtual people and show that
a wide variety of modeling techniques can be implemented with this approach.
View details
Preview abstract
We extend the work of Koehler, Skvortsov, and Vos (2013) to measure cross-device online audiences.
The method performs demographic corrections in the usual way device-by-device. A new method
that converts cross-device cookie counts to user counts is introduced. We provide practical recipes
for fitting this transformation function and then demonstrate its use using online panel data from
Japan.
View details
Preview abstract
We present a method for measuring the reach and frequency of online ad campaigns by audience attributes. This method uses a combination of data sources, including ad server logs, publisher provided user data (PPD), census data, and a representative online panel. It adjusts for known problems with cookie data and potential non-representative and inaccurate PPD. It generalizes for multiple publishers and for targeting based on the PPD. The method includes the conversion of adjusted cookie counts to unique audience counts. The benefit of our method is that we get both reduced variance from server logs and reduced bias from the panel. Simulation results and a case study are presented.
View details