Calin Cascaval
Calin Cascaval's research at Google focuses on large scale distributed systems. For his previous experience and list of publications, see his personal web page.
Authored Publications
Sort By
Preview abstract
We introduce logical synchrony, a framework that allows distributed computing to be coordinated as tightly as with pure synchrony without the distribution of a global clock or any reference to a universal time. We describe and prove the main properties of the framework and point to how processes can be executed on a logically synchronous system.
View details
Preview abstract
We discuss distributed reset control of bittide systems. In a bittide system, multiple processors communicate over a network. The processors remain in logical synchrony by controlling the timing of frame transmissions. The protocol for doing this relies upon an underlying dynamic control system, where each node makes only local observations and performs no direct coordination with other nodes. In this paper we develop a control algorithm based on the idea of reset control, which allows all nodes to maintain small buffer offsets while also requiring very little state information at each node. This offers the potential for simplified boot processes and failure handling.
View details
Preview abstract
We discuss control of bittide distributed systems. The bittide mechanism is designed to provide logical synchronization between machines on a network, by observing and controlling data transfer between systems at the physical layer of the network. In this paper we analyze the performance of approximate proportional-integral control of these systems. We develop a simple continuous-time model for the dynamics, and show that the resulting dynamics are stable for any positive choice of gains. We construct explicit formulae for the closed-loop performance of the system, measured using the L2 norm. We show that the performance is a product of two terms, one depending only on the resistance distances in the graph, and the other depending only on the controller gains.
View details
Preview abstract
Clock synchronization in distributed systems is critical for data center applications, with the maintenance of tight synchronization a challenging problem. In this paper we introduce a distributed system design that removes the need for physical clock synchronization, and instead provides applications with a tightly synchronized logical clock. We discuss the abstract frame model (AFM), a mathematical model that underpins the system synchronization, based on the rate of communication between nodes in a topology without requiring a global clock. We demonstrate that the system remains logically synchronized as long as the processor clocks remain approximately synchronized with respect to real (wall-clock) time. We show that there are families of controllers that satisfy the properties required for existence and uniqueness of solutions to the AFM, and give examples.
View details