Diagnosing Latency in Multi-Tier Black-Box Services
Abstract
As multi-tier cloud applications become pervasive, we need better tools for understanding their performance. This paper presents a system that analyzes observed or desired changes to end-to-end latency prole in a large distributed application, and identifies their underlying causes. It recognizes changes to system conguration, workload, or performance of individual services that lead to the observed or desired outcome. Experiments on an industrial datacenter demonstrate the utility of the system.