Meaningful availability
Abstract
Accurate measurement of service availability is the cornerstone of good service management: it quantifies the gap between user expectation and system performance, and provides actionable data to prioritize development and operational tasks. We propose a novel metric, user-uptime, which is event- based but is time-sensitive and which approximates aggregated user-perceived reliability better than current metrics. For a holistic view of availability across timescales from minutes to months or quarters, we augment user-uptime with a novel aggregation and visualization paradigm: windowed uptime. Using an example from G Suite we demonstrate its effectiveness in differentiating between unreliability caused
by flakiness and an extended outage.
by flakiness and an extended outage.