Google Research

Tracking Audience Statistics with HyperLogLog

Google (2021)

Abstract

HyperLogLog is the state of the art nearly optimal algorithm for approximate cardinality estimation. We consider the application of HyperLogLog for building scalable systems for internet audience reach reporting. We present an extension of HyperLogLog that enables tracking additional information about the audience, such as demographic distribution, frequency histogram or fraction of spam. We also give an intuitive explanation of why HyperLogLog works, which we find useful, as intuition of the proof in the original HyperLogLog paper requires a lot of effort to understand. This extension and the intuition are itself generic and are not limited to internet reach reporting.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work