Tracking Audience Statistics with HyperLogLog
Abstract
HyperLogLog is the state of the art nearly optimal algorithm for approximate
cardinality estimation. We consider the application of HyperLogLog for building
scalable systems for internet audience reach reporting.
We present an extension of HyperLogLog that enables tracking additional
information about the audience, such as demographic distribution, frequency
histogram or fraction of spam. We also give an intuitive explanation of why
HyperLogLog works, which we find useful, as intuition of the proof in the
original HyperLogLog paper requires a lot of effort to understand. This
extension and the intuition are itself generic and are not limited to internet
reach reporting.
cardinality estimation. We consider the application of HyperLogLog for building
scalable systems for internet audience reach reporting.
We present an extension of HyperLogLog that enables tracking additional
information about the audience, such as demographic distribution, frequency
histogram or fraction of spam. We also give an intuitive explanation of why
HyperLogLog works, which we find useful, as intuition of the proof in the
original HyperLogLog paper requires a lot of effort to understand. This
extension and the intuition are itself generic and are not limited to internet
reach reporting.