Google Research

Meaningful availability

17th USENIX Symposium on Networked Systems Design and Implementation (NSDI'20) (2020)

Abstract

Accurate measurement of service availability is the cornerstone of good service management: it quantifies the gap between user expectation and system performance, and provides actionable data to prioritize development and operational tasks. We propose a novel metric, user-uptime, which is event- based but is time-sensitive and which approximates aggregated user-perceived reliability better than current metrics. For a holistic view of availability across timescales from minutes to months or quarters, we augment user-uptime with a novel aggregation and visualization paradigm: windowed uptime. Using an example from G Suite we demonstrate its effectiveness in differentiating between unreliability caused by flakiness and an extended outage.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work