Jump to Content

Tracking Audience Statistics with HyperLogLog

Andreas Ulbrich
Jeffrey Scott Wilhelm
Josh Bao
Lawrence Tsang
Will Bradbury
Google (2021)

Abstract

HyperLogLog is the state of the art nearly optimal algorithm for approximate cardinality estimation. We consider the application of HyperLogLog for building scalable systems for internet audience reach reporting. We present an extension of HyperLogLog that enables tracking additional information about the audience, such as demographic distribution, frequency histogram or fraction of spam. We also give an intuitive explanation of why HyperLogLog works, which we find useful, as intuition of the proof in the original HyperLogLog paper requires a lot of effort to understand. This extension and the intuition are itself generic and are not limited to internet reach reporting.