- Andreas Ulbrich
- Evgeny Sergeevich Skvortsov
- Jeffrey Scott Wilhelm
- Josh Bao
- Lawrence Tsang
- Will Bradbury
Abstract
HyperLogLog is the state of the art nearly optimal algorithm for approximate cardinality estimation. We consider the application of HyperLogLog for building scalable systems for internet audience reach reporting. We present an extension of HyperLogLog that enables tracking additional information about the audience, such as demographic distribution, frequency histogram or fraction of spam. We also give an intuitive explanation of why HyperLogLog works, which we find useful, as intuition of the proof in the original HyperLogLog paper requires a lot of effort to understand. This extension and the intuition are itself generic and are not limited to internet reach reporting.
Research Areas
Learn more about how we do research
We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work