Krzysztof Ostrowski
Authored Publications
Sort By
Preview abstract
End-to-end latency of serving jobs in distributed and shared environments, such as a Cloud, is an important metric for jobs' owners and infrastructure providers. Yet it is notoriously challenging to model precisely, since it is affected by a large collection of unrelated moving pieces, from the software design to the job schedulers strategies. In this work we present a novel approach to modeling latency, by tracking how it varies with CPU usage. We train a classifier to automatically assign the latency behavior of methods in three classes: constant latency regardless of CPU, uncorrelated latency and CPU, and predictable latency as a function of CPU. We use our model on a random sample of serving jobs running on the Google infrastructure. We illustrate unexpected and insightful patterns of latency variations with CPU. The visualization of latency-CPU variations and the corresponding class may be used by both jobs' owners and infrastructure providers, for a variety of applications, such as smarter latency alerting, latency-aware configuration of jobs, and automated detection of changes in behavior, either over time, during pre-release testing, or across data centers.
View details
Recursion in Scalable Protocols via Distributed Data Flows
Languages for Distributed Algorithms (2012) (to appear)
Preview abstract
This paper proposes a new approach to representing scalable hierarchical distributed multi-party protocols, and reasoning about their behavior. The established endpoint-to-endpoint message-passing abstraction provides little support for modeling distributed algorithms in hierarchical systems, in which the hierarchy and membership dynamically evolve. This paper explains how with our new Distributed Data Flow (DDF) abstraction, hierarchical architecture can be modeled via recursion in the language. This facilitates a more concise code, and it enables automated generation of scalable hierarchical implementations for heterogeneous network environments.
View details
Diagnosing Latency in Multi-Tier Black-Box Services
Gideon Mann
5th Workshop on Large Scale Distributed Systems and Middleware (LADIS 2011) (to appear)
Preview abstract
As multi-tier cloud applications become pervasive, we need better tools for understanding their performance. This paper presents a system that analyzes observed or desired changes to end-to-end latency prole in a large distributed application, and identifies their underlying causes. It recognizes changes to system conguration, workload, or performance of individual services that lead to the observed or desired outcome. Experiments on an industrial datacenter demonstrate the utility of the system.
View details