Chip Killian

Chip Killian

Since December 2012, I have been a Software Engineer at Google, working on Google's Platforms Networking team. Prior to that, I was an assistant professor of Computer Science at Purdue University from 2008-2012, after completing my PhD from UCSD in 2008 under Professor Amin Vahdat. While at Purdue, I was an NSF CAREER Award winner and an HP Open Innovation Award winner. Broadly, my research is in distributed systems, with a strong focus on the technologies and techniques to make it easier to design, develop, test, and debug distributed systems. My research started on the MACEDON project for building overlay networks, and evolved through the Mace project and the MaceMC model checker (Best paper award, NSDI 2007). While we continue to work on Mace and its extensions, these days our research has also grown to include data mining and log analysis to detect and diagnose problems in systems, and also the means by which to test systems under a variety of malicious conditions in an automated fashion. [Purdue Page]: http://www.cs.purdue.edu/homes/ckillian/ [Purdue Research Group]: http://www.macesystems.org/ [Homepage]: http://chip.kcubes.com/
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Orion: Google’s Software-Defined Networking Control Plane
    Amin Vahdat
    Amr Sabaa
    Arjun Singh
    Henrik Muehe
    Joon Suan Ong
    Karthik Swaminathan Nagaraj
    KondapaNaidu Bollineni
    Lorenzo Vicisano
    Mike Conley
    Min Zhu
    Rich Alimi
    Shawn Chen
    Shidong Zhang
    Waqar Mohsin
    (2021)
    Preview abstract We present Orion, a distributed Software-Defined Networking platform deployed globally in Google’s datacenter (Jupiter) as well as Wide Area (B4) networks. Orion was designed around a modular, micro-service architecture with a central publish-subscribe database to enable a distributed, yet tightly-coupled, software-defined network control system. Orion enables intent-based management and control, is highly scalable and amenable to global control hierarchies. Over the years, Orion has matured with continuously improving performance in convergence (up to 40x faster), throughput (handling up to 1.16 million network updates per second), system scalability (supporting 16x larger networks), and data plane availability (50x, 100x reduction in unavailable time in Jupiter and B4, respectively) while maintaining high development velocity with bi-weekly release cadence. Today, Orion robustly enables all of Google’s Software-Defined Networks defending against failure modes that are both generic to large scale production networks as well as unique to SDN systems. View details
    EventWave: Programming Model and Runtime Support for Tightly-Coupled Elastic Cloud Applications
    Wei-Chiu Chuang
    Bo Sang
    Sunghwan Yoo
    Rui Gu
    Milind Kulkarni
    Proceedings of the 2013 ACM Symposium on Cloud Computing, ACM, Santa Clara, CA, USA
    Preview abstract An attractive approach to leveraging the ability of cloud-computing platforms to provide resources on demand is to build elastic applications, which can dynamically scale up or down based on resource requirements. To ease the development of elastic applications, it is useful for programmers to write applications with simple sequential semantics, without considering elasticity, and rely on runtime support to provide that elasticity. While this approach has been useful in restricted domains, such as MapReduce, existing programming models for general distributed applications do not expose enough information about their inherent organization of state and computation to provide such transparent elasticity. We introduce EVENTWAVE, an event-driven programming model that allows developers to design elastic programs with inelastic semantics while naturally exposing isolated state and computation with programmatic parallelism. In addition, we describe the runtime mechanism which takes the exposed parallelism to provide elasticity. Finally, we evaluate our implementation through microbenchmarks and case studies to demonstrate that EVENTWAVE can provide efficient, scalable, transparent elasticity for applications run in the cloud. View details
    Live Debugging of Distributed Systems
    Darren Dao
    Jeannie R. Albrecht
    Amin Vahdat
    CC(2009), pp. 94-108
    Building Distributed Systems Using Mace
    James W. Anderson
    Ryan Braud
    Ranjit Jhala
    Amin Vahdat
    Peer-to-Peer Computing(2009), pp. 91-92
    High-bandwidth data dissemination for large-scale distributed systems
    Dejan Kostic
    Alex C. Snoeren
    Amin Vahdat
    Ryan Braud
    James W. Anderson
    Jeannie R. Albrecht
    Adolfo Rodriguez
    Erik Vandekieft
    ACM Trans. Comput. Syst., 26(2008)
    Mace: language support for building distributed systems
    James W. Anderson
    Ryan Braud
    Ranjit Jhala
    Amin Vahdat
    PLDI(2007), pp. 179-188
    Life, Death, and the Critical Transition: Finding Liveness Bugs in Systems Code (Awarded Best Paper)
    James W. Anderson
    Ranjit Jhala
    Amin Vahdat
    NSDI(2007)
    Pip: Detecting the Unexpected in Distributed Systems
    Patrick Reynolds
    Janet L. Wiener
    Mehul A. Shah
    Amin Vahdat
    NSDI(2006)
    Maintaining High-Bandwidth Under Dynamic Network Conditions
    Dejan Kostic
    Ryan Braud
    Erik Vandekieft
    James W. Anderson
    Alex C. Snoeren
    Amin Vahdat
    USENIX Annual Technical Conference, General Track(2005), pp. 193-208
    Brief announcement: the overlay network content distribution problem
    Alex C. Snoeren
    Amin Vahdat
    Joseph Pasquale
    PODC(2005), pp. 98
    MACEDON: Methodology for Automatically Creating, Evaluating, and Designing Overlay Networks
    Adolfo Rodriguez
    Sooraj Bhat
    Dejan Kostic
    Amin Vahdat
    NSDI(2004), pp. 267-280