Diagnosing Latency in Multi-Tier Black-Box Services

Gideon Mann
5th Workshop on Large Scale Distributed Systems and Middleware (LADIS 2011) (to appear)
Google Scholar

Abstract

As multi-tier cloud applications become pervasive, we need better tools for understanding their performance. This paper presents a system that analyzes observed or desired changes to end-to-end latency pro le in a large distributed application, and identi fies their underlying causes. It recognizes changes to system con guration, workload, or performance of individual services that lead to the observed or desired outcome. Experiments on an industrial datacenter demonstrate the utility of the system.