Krzysztof Ostrowski

Krzysztof Ostrowski

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract End-to-end latency of serving jobs in distributed and shared environments, such as a Cloud, is an important metric for jobs' owners and infrastructure providers. Yet it is notoriously challenging to model precisely, since it is affected by a large collection of unrelated moving pieces, from the software design to the job schedulers strategies. In this work we present a novel approach to modeling latency, by tracking how it varies with CPU usage. We train a classifier to automatically assign the latency behavior of methods in three classes: constant latency regardless of CPU, uncorrelated latency and CPU, and predictable latency as a function of CPU. We use our model on a random sample of serving jobs running on the Google infrastructure. We illustrate unexpected and insightful patterns of latency variations with CPU. The visualization of latency-CPU variations and the corresponding class may be used by both jobs' owners and infrastructure providers, for a variety of applications, such as smarter latency alerting, latency-aware configuration of jobs, and automated detection of changes in behavior, either over time, during pre-release testing, or across data centers. View details
    Recursion in Scalable Protocols via Distributed Data Flows
    Languages for Distributed Algorithms(2012) (to appear)
    Preview abstract This paper proposes a new approach to representing scalable hierarchical distributed multi-party protocols, and reasoning about their behavior. The established endpoint-to-endpoint message-passing abstraction provides little support for modeling distributed algorithms in hierarchical systems, in which the hierarchy and membership dynamically evolve. This paper explains how with our new Distributed Data Flow (DDF) abstraction, hierarchical architecture can be modeled via recursion in the language. This facilitates a more concise code, and it enables automated generation of scalable hierarchical implementations for heterogeneous network environments. View details
    Diagnosing Latency in Multi-Tier Black-Box Services
    Gideon Mann
    5th Workshop on Large Scale Distributed Systems and Middleware (LADIS 2011) (to appear)
    Preview abstract As multi-tier cloud applications become pervasive, we need better tools for understanding their performance. This paper presents a system that analyzes observed or desired changes to end-to-end latency pro le in a large distributed application, and identi fies their underlying causes. It recognizes changes to system con guration, workload, or performance of individual services that lead to the observed or desired outcome. Experiments on an industrial datacenter demonstrate the utility of the system. View details
    Self-Replicating Objects for Multicore Platforms
    Chuck Sakoda
    Ken Birman
    24th European Conference on Object-Oriented Programming (ECOOP 2010)
    Preview abstract The paper introduces Self-Replicating Objects (SROs), a new concurrent programming abstraction. An SRO is implemented and used much like an ordinary .NET object and can expose arbitrary user-defined APIs, but it is aggressive about automatically exploiting multicore CPUs. It does so by spontaneously and transparently partitioning its state into a set of replicas that handle method calls in parallel and automatically merging replicas before processing calls that cannot execute in the replicated state. Developers need not be concerned about protecting access to shared data; each replica is a monitor and has its own state. The runtime ensures proper synchronization, scheduling, decides when to split/merge, and can transparently migrate replicas to other processes to decrease contention. Compared to threads/locks or toolkits such as .NET Parallel Extensions, SROs offer a simpler, more versatile programming model while delivering comparable, and in some cases even higher performance. View details
    Enabling Tactical Edge Mashups with Live Objects.
    Daniel Freedman
    Ken Birman
    Mark Linderman
    Robert Hillman
    Albert Frantz
    15th International Command and Control Research and Technology Symposium (ICCRTS 2010)
    Preview abstract We introduce the Live Objects framework, which leverages our distributed object-oriented programming model and enables tactical edge mashups for battlefield command and control. Unlike most deployed web services, which are typically limited to client-server interactions, Live Objects can simultaneously support multiple patterns of communication, including direct client-to-client protocols. This means that when clients are forward deployed or accessible only through disadvantaged links, a Live Objects system can remain highly responsive, whereas more standard solutions might slow down precipitously, become unresponsive, or fail outright. Here, we summarize the approach and then suggest that, when using it, a new kind of Service-Oriented Collaboration (SOC) application can be created that will combine direct client-to-client sharing of imagery, videos, or other real-time data captured in the field, with service-hosted data, including geographic information systems, weather prediction systems, social networks, and other databases. The client-to-client solutions can include powerful new collaboration features implemented to help the user achieve tactical Command and Control (C2) goals that require split-second coordination, for which reach-back to a server might be impossibly slow. We showcase key properties of our platform through a functional, proof-of-concept Combat Search and Rescue (CSAR) scenario. View details
    Object-Oriented Architecture for Web Services Eventing
    Ken Birman
    Danny Dolev
    Web Services Research for Emerging Applications: Discoveries and Trends(2010), pp. 140-184
    Preview abstract Existing web service notification and eventing standards are useful in many applications, but they have serious limitations that make them ill-suited for large-scale deployments, or as a middleware or a component-integration technology in today’s data centers. For example, it is not possible to use IP multicast, or for recipients to forward messages to others, scalable notification trees must be setup manually, and no end-to-end security, reliability, or QoS guarantees can be provided. This chapter proposes an architecture based on object-oriented design principles that is free of such limitations, extremely modular and extensible, and that can serve as a basis for extending and complementing the existing standards. The new approach emerges from the authors’ work on Live Distributed Objects, a new programming model that brings object-orientation into the realm of distributed computing. View details
    Edge Mashups for Service-Oriented Collaboration
    Ken Birman
    Jared Cantwell
    Daniel Freedman
    Qi Huang
    Petko Nikolov
    Journal Computer, 42(2009)
    Preview abstract The Live Distributed Objects platform makes it possible to combine hosted content with P2P protocols in a single object-oriented framework. View details
    Building Collaboration Applications that Mix Web Services Hosted Content with P2P Protocols
    Ken Birman
    Jared Cantwell
    Daniel Freedman
    Qi Huang
    Petko NIkolov
    7th IEEE International Conference on Web Services (ICWS 2009)
    Preview abstract The most commonly deployed web service applications employ client-server communication patterns, with clients running remotely and services hosted in data centers. In this paper, we make the case for Service-Oriented Collaboration applications that combine service-hosted data with collaboration features implemented using peer-to-peer protocols. Collaboration features are awkward to support solely based on the existing web services technologies. Indirection through the data center introduces high latencies and limits scalability, and precludes collaboration between clients connected to one-another but lacking connectivity to the data center. Cornell's Live Distributed Objects platform combines web services with direct peer-to-peer communication to eliminate these issues. View details
    Programming Live Distributed Objects with Distributed Data Flows
    Ken Birman
    Danny Dolev
    Cornell University(2009)
    Preview abstract This paper presents a new object-oriented approach to modeling the semantics of distributed multi-party protocols such as leader election, distributed locking, or reliable multicast, and a programming language that supports it. The approach builds on and extends our live distributed objects model by introducing a new concept of a distributed flow, a stream of messages flowing concurrently at multiple locations. Our flows correspond to variables, private fields, and method parameters in Java-like languages; they are the means by which we store and communicate state. Active protocol instances, which correspond to Java objects, consume and output flows; their internal states are encapsulated as internal flows, and all of their internal logic is represented as operations on flows. Our language supports a new type of concern separation: the semantic structure of protocols is decoupled from implementation details such as the construction and maintenance of overlays, trees, or other hierarchical structures needed for scalability. The latter can be addressed by the compiler or at the deployment time; it can be done differently in different parts of the network, to match local network characteristics. The paper introduces the basic language concepts, syntax, and semantics, illustrating formal definitions with a discussion of example protocols such as leader election, distributed locking, agreement, and loss recovery. It shows examples of rules for a formal reasoning about programs in our language. While full implementation details of the supporting compiler and runtime are beyond the scope of this paper, we do briefly describe how our new language primitives can be implemented. Our approach is practical: the core language constructs, including hierarchical monotonic aggregations, have been implemented and evaluated in a simulator. The full compiler framework is in preparation and will be publicly released as a part of our live distributed objects platform. View details
    Extending Service-Oriented Architecture with Hierarchical Composition of Client-Side Asynchronous Event-Processing Logic
    Ken Birman
    7th IEEE International Conference on Web Services (ICWS 2009)
    Preview abstract There is a growing need for a new type of WS-*/SOA standards that could facilitate hierarchical, object-oriented composition of client-side executable code. This is especially true for the sorts of client-side logic embedded in AJAX and rich Internet applications, virtual worlds and MMORPGs; code that deals with issuing requests to servers, processing their responses, rendering UI, interacting with users, and processing asynchronous events from other client nodes. The paper offers an analysis of client-side composition patterns, a brief explanation why they lack adequate support from the existing web technologies, and design guidelines for client-side component integration environments to follow. The proposed guidelines have been successfully implemented in a prototype system. Our analysis is thus strongly rooted in reality; it is based on real experiences with concrete application scenarios. The paper concludes by highlighting the key architectural aspects of our implementation with respect to the principles listed earlier. View details
    Storing and Accessing Live Mashup Content in the Cloud
    Ken Birman
    ACM SIGOPS Operating Systems Review, 44(2009)
    Preview abstract Today's Rich Internet Application (RIA) technologies such as Ajax, Flex, or Silverlight, are designed around the client-server paradigm and cannot easily take advantage of replication, publish-subscribe, or peer-to-peer mechanisms for better scalability or responsiveness. This is particularly true of storage: content is typically persisted in data centers and consumed via web services. We propose1 a checkpointed channel (CC) abstraction as an alternative model for storing and accessing content. CCs are architecture-agnostic: they could be implemented as web services, but also as replicated state machines running over peer-to-peer multicast protocols. They can seamlessly span across the data center boundaries, or live at the edge. They are a more natural way of consuming streaming content. CCs can store hierarchical documents with hyperlinks to other CCs, thus forming a web of interconnected CCs: a live scalable information space. We discuss the advantages of the new abstraction, challenges it poses, and the way it fits within the existing models for RIA development. View details
    Distributed Data Flow Language for Multi-Party Protocols
    Ken Birman
    Danny Dolev
    5th Workshop on Programming Languages and Operating Systems (PLOS 2009)
    Preview abstract This paper presents a novel object-oriented approach to modeling the semantics of distributed multi-party protocols such as leader election, distributed locks or reliable multicast, and a programming language that supports it. The approach extends our live distributed objects (LO) model with the new concept of a distributed flow (DF), a stream of events that flow concurrently at multiple locations. DFs correspond to local variables, private fields, and method parameters in Java-like languages; they're means by which one stores and communicates state. Protocol instances correspond to Java objects; they consume and output flows; their internal states are encapsulated as internal flows, and their internal logic is represented as operations on flows. Our language provides a new type of concern separation: the semantic structure of protocols is decoupled from implementation details such as construction and maintenance of overlays, trees, and other structures used for scalability. These can be generated by the compiler or at deployment time. This can be done differently in different parts of the network, to match the local environment. View details
    Implementing Reliable Event Streams in Large Systems via Distributed Data Flows and Recursive Delegation
    Ken Birman
    Danny Dolev
    Chuck Sakoda
    3rd ACM International Conference on Distributed Event-Based Systems (DEBS 2009)
    Preview abstract Strong reliability properties, such as state machine replication and virtual synchrony, are hard to implement in a scalable manner. They are typically expressed in terms of global membership views. However, global membership is non-scalable. We propose a new way of modeling protocols that does not rely on global membership. Our approach is based on the concept of a distributed data flow, a set of events distributed in space and time. We model protocols as networks of such flows, constructed through recursive delegation. The resulting system uses multiple small membership services instead of a single global service while still supporting stronger properties. This paper focuses on the theoretical model and its base properties; in particular, on the concept of monotonic aggregation. We present a high-level architecture overview and initial performance results. View details
    Using Live Distributed Objects for Office Automation
    Jong Hoon Ahnn
    Ken Birman
    Robbert van Renesse
    ACM/IFIP/USENIX Middleware 2008 Conference Companion (Companion 2008)
    Preview abstract Web services and platforms such as .NET make it easy to integrate interactive end-user applications with backend services. However, it remains hard to build collaborative applications in which information is shared within teams. This paper introduces a new drag-and-drop technology, in which standard office documents (spreadsheets, databases, etc.) are interconnected with event-driven middleware ("live distributed objects"), to create distributed applications in which changes to underlying data propagate quickly to downstream applications. Information is replicated in a consistent manner, making it easy for team members to share updates and to coordinate their actions. We present our middleware platform, and show that it offers good performance and scalability, with small resource footprint. Moreover, because the approach is highly automated, and the underlying middleware is highly configurable, we're in a position to automatically address security and reliability needs that might otherwise be onerous. In addition to reviewing our existing system, we list open issues, which include integration with external data sources, and updating stored, but inactive objects. View details
    Live Distributed Objects
    Cornell University, 76 Ninth Avenue(2008)
    Preview abstract Distributed multiparty protocols such as multicast, atomic commit, or gossip are currently underutilized, but we envision that they could be used pervasively, and that developers could work with such protocols similarly to how they work with CORBA/COM/.NET/Java objects. We have created a new programming model and a platform in which protocol instances are represented as objects of a new type called live distributed objects: strongly-typed building blocks that can be composed in a type-safe manner through a drag and drop interface. Unlike most prior object-oriented distributed protocol embeddings, our model appears to be flexible enough to accommodate most popular protocols, and to be applied uniformly to any part of a distributed system, to build not only front-end, but also back-end components, such as multicast channels, naming, or membership services. While the platform is not limited to applications based on multicast, it is replication-centric, and reliable multicast protocols are important building blocks that can be used to create a variety of scalable components, from shared documents to fault-tolerant storage or scalable role delegation. We propose a new multicast architecture compatible with the model and designed in accordance with object-oriented principles such as modularity and encapsulation. The architecture represents multicast objects as compositions of smaller objects, glued using a decentralized membership mechanism. The benefits include support for heterogeneous networks by allowing a single multicast protocol instance to simultaneously leverage different mechanisms in different parts of the Internet. Finally, we describe and evaluate a scalable reliable multicast prototype that incorporates a novel approach to scaling with the number of protocol instances by leveraging regular patterns of overlap. The system achieves exceptional performance, scalability, and stability in the presence of perturbations, in part thanks to a novel application of techniques such as priority scheduling or pull-based processing. We describe a previously unnoticed relationship between memory overheads and scheduling and the performance and scalability of a reliable multicast protocol. Our results demonstrate that to build a new global Web operating system that the live objects vision leads to, the distributed protocol and the local system architecture cannot be treated in isolation. View details
    Programming with Live Distributed Objects
    Ken Birman
    Danny Dolev
    Jong Hoon Ahnn
    22nd European conference on Object-Oriented Programming (ECOOP 2008)
    Preview abstract A component revolution is underway, bringing developers improved productivity and opportunities for code reuse. However, whereas existing tools work well for builders of desktop applications and client-server structured systems, support for other styles of distributed computing has lagged. In this paper, we propose a new programming paradigm and a platform, in which instances of distributed protocols are modeled as "live distributed objects". Live objects can represent both protocols and higher-level components. They look and feel much like ordinary objects, but can maintain shared state and synchronization across multiple machines within a network. Live objects can be composed in a type-safe manner to build sophisticated distributed applications using a simple, intuitive drag and drop interface, very often without writing any code or having to understand the intricacies of the underlying distributed algorithms. View details
    QuickSilver Scalable Multicast (QSM)
    Ken Birman
    Danny Dolev
    7th IEEE International Symposium on Network Computing and Applications (NCA 2008)
    Preview abstract QSM is a multicast engine designed to support a style of distributed programming in which application objects are replicated among clients and updated via multicast. The model requires platforms that scale in dimensions previously unexplored; in particular, to large numbers of multicast groups. Prior systems we-ren’t optimized for such scenarios and can’t take ad-vantage of regular group overlap patterns, a key fea-ture of our application domain. Furthermore, little is known about performance and scalability of such sys-tems in modern managed environments. We shed light on these issues and offer architectural insights based on our experience building QSM. View details
    Extensible Architecture for High-Performance, Scalable, Reliable Publish-Subscribe Eventing and Notification
    Ken Birman
    Danny Dolev
    International Journal of Web Services Research, 4(2007)
    Preview abstract Existing Web service notification and eventing standards are useful in many applications, but they have serious limitations that make them ill-suited for large-scale deployments, or as a middleware or a component- integration technology in today’s data centers. For example, it is not possible to use IP multicast, or for recipients to forward messages to others, scalable notification trees must be setup manually, and no end-to-end security, reliability, or QoS guarantees can be provided. We propose an architecture that is free of such limitations and that may serve as a basis for extending or complementing the existing standards. The approach emerges from our work on QuickSilver, a new, extremely modular and extensible platform for high-performance, scalable, reliable eventing. View details
    Scalable Multicast Platforms for a New Generation of Robust Distributed Applications
    Ken Birman
    Mahesh Balakrishnan
    Danny Dolev
    Tudor Marian
    Amar Phanishayee
    2nd IEEE/Create-Net/ICST International Conference on Communication System software and Middleware (COMSWARE)(2007)
    Preview abstract As distributed systems scale up and are deployed into increasingly sensitive settings, demand is rising for a new generation of communications middleware in support of application-level critical-computing uses. Ricochet, Tempest and Quicksilver are multicast-based systems developed to respond to this need. Ricochet and Quicksilver are multicast platforms; both are exceptionally scalable and support fault-tolerance properties that match closely with the needs of high-availability applications. Ricochet was designed to support time-critical applications replicated for scalability on data centers and clusters. These are typically coded in Java and run under Linux. Tempest is layered over Ricochet and automates most tasks of programming services for data centers. In contrast, Quicksilver focuses on high throughput and is targeted towards very large deployments of desktop computing systems, in support of publish-subscribe, event notification or media dissemination applications. In this paper we offer an overview of the systems and some of the new systems embeddings that, we believe, make them far easier to use than was the case in prior multicast platforms. View details
    Live Distributed Objects: Enabling the Active Web
    Ken Birman
    Danny Dolev
    IEEE Internet Computing, 11(2007)
    Preview abstract Distributed computing has lagged behind the productivity revolution that has transformed the desktop in recent years. Programmers still generally treat the Web as a separate technology space and develop network applications using low-level message-passing primitives or unreliable Web services method invocations. Live distributed objects are designed to offer developers a scalable multicast infrastructure that's tightly integrated with a runtime environment. View details
    Implementing Scalable Publish-Subscribe in a Managed Runtime Environment
    Ken Birman
    Large-Scale Distributed Systems and Middleware (LADIS 2007)
    Preview abstract The reliable multicast, publish-subscribe, and group communication paradigms are highly effective in support of replication and event notification, and could serve as the enabling technologies for new types of applications that are both interactive and decentralized. To fully realize this vision, we need a high-performance, scalable, and reliable multicast engine, as an integral part of the runtime environment. Since the majority of development today is done in managed, strongly-typed environments such as Java or.NET, integration with such environments is of particular importance. What factors limit performance and scalability of a reliable multicast engine in a managed environment? What support from the runtime could improve performance, avoid instabilities, or make such systems easier to build? We focused on answering these questions by analyzing the performance of QuickSilver Scalable Multicast (QSM), a new multicast protocol and a system we’ve built entirely in.NET. We found that memory-related overheads and scheduling-related phenomena dominate the behavior of our system, and that most problems can be alleviated by techniques such as restructuring the protocol stack to limit caching and buffering, and introducing a custom, priority-based scheduling policy. View details
    Exploiting Gossip for Self-Management in Scalable Event Notification Systems
    Ken Birman
    Anne-Marie Kermarrec
    Martin Bertier
    Danny Dolev
    Robbert van Renesse
    27th International Conference on Distributed Computing Systems Workshops (ICDCSW 2007)
    Preview abstract Challenges of scale have limited the development of event notification systems with strong properties, despite the urgent demand for consistency, reliability, security, and other guarantees in applications developed for sensitive tasks in large enterprises. These issues are the focus of Quicksilver, a new multicast platform targeted to largescale deployments. An initial version of the system can support large numbers of overlapping multicast groups, high data rates and groups with large numbers of members. However, Quicksilver still requires manual help when discovering the system configuration and can't easily enforce certain types of application monitoring and integrity constraints. In this paper, we propose to extend Quicksilver by introducing gossip mechanisms, yielding a self-managed event notification platform. The two technologies are presented through a single interface and appear to end users as live distributed objects, side-byside with other kinds of typed components. View details
    Scalable Group Communication System for Scalable Trust
    Ken Birman
    1st ACM Workshop on Scalable Trusted Computing (STC 2006)
    Preview abstract Programmers of large-scale trusted systems need tools to simplify tasks such as replicating services or data. Group communication systems achieve this via various flavors of reliable multicast, but the existing solutions do not scale in all major dimensions. Typically, they scale poorly in the number of groups; yet we believe that using groups casually could lead to new, easier ways of programming. We propose QSM [1], a new multicast substrate that scales in several dimensions at once. Our approach relies on a novel way of exploiting the overlap between groups. View details
    QuickSilver Properties Framework
    Ken Birman
    Danny Dolev
    OSDI(2006)
    Extensible Web Services Architecture for Notification in Large-Scale Systems
    Ken Birman
    IEEE International Conference on Web Services (ICWS 2006)
    Preview abstract Existing Web services notification and eventing standards are useful in many applications, but they have serious limitations precluding large-scale deployments: it is impossible to use IP multicast or for recipients to forward messages to others and scalable notification trees must be setup manually. We propose a design free of such limitations that could serve as a basis for extending or complementing these standards. The approach emerges from our prior work on QSM (Ostrowski et al., 2006), a new Web services eventing platform that can scale to extremely large environments. View details