Jeffrey C. Mogul

Jeffrey C. Mogul

Jeff Mogul works on fast, cheap, reliable, and flexible networking infrastructure for Google. Until 2013, he was Fellow at HP Labs, doing research primarily on computer networks and operating systems issues for enterprise and cloud computer systems; previously, he worked at the DEC/Compaq Western Research Lab. He received his PhD from Stanford in 1986, an MS from Stanford in 1980, and an SB from MIT in 1979. He is an ACM Fellow. Jeff is the author or co-author of several Internet Standards; he contributed extensively to the HTTP/1.1 specification. He was an associate editor of Internetworking: Research and Experience, and has been the chair or co-chair of a variety of conferences and workshops, including SIGCOMM, OSDI, NSDI, USENIX, HotOS, and ANCS. You can find a mostly up-to-date CV at http://jmogul.com/mogulcv.pdf
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Physical Deployability Matters
    Proc. HotNets 2023: Twenty-Second ACM Workshop on Hot Topics in Networks
    Preview abstract While many network research papers address issues of deployability, with a few exceptions, this has been limited to protocol compatibility or switch-resource constraints, such as flow table sizes. We argue that good network designs must also consider the costs and complexities of deploying the design within the constraints of the physical environment in a datacenter: \emph{physical} deployability. The traditional metrics of network ``goodness'' mostly do not account for these costs and constraints, and this may partially explain why some otherwise attractive designs have not been deployed in real-world datacenters. View details
    Change Management in Physical Network Lifecycle Automation
    Mo Alfares
    Virginia Beauregard
    Kevin Grant
    Angus Griffith
    Jahangir Hasan
    Chen Huang
    Quan Leng
    Jiayao Li
    Alexander Lin
    Zhoutao Liu
    Ahmed Mansy
    Bill Martinusen
    Nikil Mehta
    Andrew Narver
    Anshul Nigham
    Melanie Obenberger
    Sean Smith
    Kurt Steinkraus
    Sheng Sun
    Edward Thiele
    Amin Vahdat
    Proc. 2023 USENIX Annual Technical Conference (USENIX ATC 23)
    Preview abstract Automated management of a physical network's lifecycle is critical for large networks. At Google, we manage network design, construction, evolution, and management via multiple automated systems. In our experience, one of the primary challenges is to reliably and efficiently manage change in this domain -- additions of new hardware and connectivity, planning and sequencing of topology mutations, introduction of new architectures, new software systems and fixes to old ones, etc. We especially have learned the importance of supporting multiple kinds of change in parallel without conflicts or mistakes (which cause outages) while also maintaining parallelism between different teams and between different processes. We now know that this requires automated support. This paper describes some of our network lifecycle goals, the automation we have developed to meet those goals, and the change-management challenges we encountered. We then discuss in detail our approaches to several specific kinds of change management: (1) managing conflicts between multiple operations on the same network; (2) managing conflicts between operations spanning the boundaries between networks; (3) managing representational changes in the models that drive our automated systems. These approaches combine both novel software systems and software-engineering practices. While this paper reports on our experience with large-scale datacenter network infrastructures, we are also applying the same tools and practices in several adjacent domains, such as the management of WAN systems, of machines, and of datacenter physical designs. Our approaches are likely to be useful at smaller scales, too. View details
    Preview abstract We (Google's networking teams) would like to increase our collaborations with academic researchers related to data-driven networking research. There are some significant constraints on our ability to directly share data, and in case not everyone in the community understands these, this document provides a brief summary. There are some models which can work (primarily, interns and visiting scientists). We describe some specific areas where we would welcome proposals to work within those models View details
    Cores that don't count
    Parthasarathy Ranganathan
    Amin Vahdat
    Proc. 18th Workshop on Hot Topics in Operating Systems (HotOS 2021)
    Preview abstract We are accustomed to thinking of computers as fail-stop, especially the cores that execute instructions, and most system software implicitly relies on that assumption. During most of the VLSI era, processors that passed manufacturing tests and were operated within specifications have insulated us from this fiction. As fabrication pushes towards smaller feature sizes and more elaborate computational structures, and as increasingly specialized instruction-silicon pairings are introduced to improve performance, we have observed ephemeral computational errors that were not detected during manufacturing tests. These defects cannot always be mitigated by techniques such as microcode updates, and may be correlated to specific components within the processor, allowing small code changes to effect large shifts in reliability. Worse, these failures are often "silent'': the only symptom is an erroneous computation. We refer to a core that develops such behavior as "mercurial.'' Mercurial cores are extremely rare, but in a large fleet of servers we can observe the correlated disruption they cause, often enough to see them as a distinct problem -- one that will require collaboration between hardware designers, processor vendors, and systems software architects. This paper is a call-to-action for a new focus in systems research; we speculate about several software-based approaches to mercurial cores, ranging from better detection and isolating mechanisms, to methods for tolerating the silent data corruption they cause. Please watch our short video summarizing the paper. View details
    GEMINI: Practical Reconfigurable Datacenter Networks with Topology and Traffic Engineering
    Mingyang Zhang
    Jianan Zhang
    Rui Wang
    Ramesh Govindan
    Amin Vahdat
    (2021)
    Preview abstract To reduce cost, datacenter network operators are exploring blocking network designs. An example of such a design is a "spine-free" form of a Fat-Tree, in which pods directly connect to each other, rather than via spine blocks. To maintain application-perceived performance in the face of dynamic workloads, these new designs must be able to reconfigure routing and the inter-pod topology. Gemini is a system designed to achieve these goals on commodity hardware while reconfiguring the network infrequently, rendering these blocking designs practical enough for deployment in the near future. The key to Gemini is the joint optimization of topology and routing, using as input a robust estimation of future traffic derived from multiple historical traffic matrices. Gemini “hedges” against unpredicted bursts, by spreading these bursts across multiple paths, to minimize packet loss in exchange for a small increase in path lengths. It incorporates a robust decision algorithm to determine when to reconfigure, and whether to use hedging. Data from tens of production fabrics allows us to categorize these as either low- or high-volatility; these categories seem stable. For the former, Gemini finds topologies and routing with near-optimal performance and cost. For the latter, Gemini’s use of multi-traffic-matrix optimization and hedging avoids the need for frequent topology reconfiguration, with only marginal increases in path length. As a result, Gemini can support existing workloads on these production fabrics using a spine-free topology that is half the cost of the existing topology on these fabrics. View details
    Preview abstract Network management is becoming increasingly automated, and automation depends on detailed, explicit representations of data about both the state of a network, and about an operator’s intent for its networks. In particular, we must explicitly represent the desired and actual topology of a network; almost all other network-management data either derives from its topology, constrains how to use a topology, or associates resources (e.g., addresses) with specific places in a topology. We describe MALT, a Multi-Abstraction-Layer Topology representation, which supports virtually all of our network management phases: design, deployment, configuration, operation, measurement, and analysis. MALT provides interoperability across software systems, and its support for abstraction allows us to explicitly tie low-level network elements to high-level design intent. MALT supports a declarative style that simplifies what-if analysis and testbed support. We also describe the software base that supports efficient use of MALT, as well as numerous, sometimes painful lessons we have learned about curating the taxonomy for a comprehensive, and evolving, representation for topology. View details
    Minimal Rewiring: Efficient Live Expansion for Clos Data Center Networks
    Shizhen Zhao
    Rui Wang
    Junlan Zhou
    Joon Ong
    Amin Vahdat
    Proc. 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2019), USENIX Association (to appear)
    Preview abstract Clos topologies have been widely adopted for large-scale data center networks (DCNs), but it has been difficult to support incremental expansions of Clos DCNs. Some prior work has assumed that it is impossible to design DCN topologies that are both well-structured (non-random) and incrementally expandable at arbitrary granularities. We demonstrate that it is indeed possible to design such networks, and to expand them while they are carrying live traffic, without incurring packet loss. We use a layer of patch panels between blocks of switches in a Clos network, which makes physical rewiring feasible, and we describe how to use integer linear programming (ILP) to minimize the number of patch-panel connections that must be changed, which makes expansions faster and cheaper. We also describe a block-aggregation technique that makes our ILP approach scalable. We tested our "minimal-rewiring" solver on two kinds of fine-grained expansions using 2250 synthetic DCN topologies, and found that the solver can handle 99% of these cases while changing under 25% of the connections. Compared to prior approaches, this solver (on average) reduces the number of "stages" per expansion by about 3.1X -- a significant improvement to our operational costs, and to our exposure (during expansions) to capacity-reducing faults. View details
    Nines are Not Enough: Meaningful Metrics for Clouds
    Proc. 17th Workshop on Hot Topics in Operating Systems (HoTOS)(2019)
    Preview abstract Cloud customers want reliable, understandable promises from cloud providers that their applications will run reliably and with adequate performance, but today, providers offer only limited guarantees, which creates uncertainty for customers. Providers also must define internal metrics to allow them to operate their systems without violating customer promises or expectations. We explore why these guarantees are hard to define. We show that this problem shares some similarities with the challenges of applying statistics to make decisions based on sampled data. We also suggest that defining guarantees in terms of defense against threats, rather than guarantees for application-visible outcomes, can reduce the complexity of these problems. Overall, we offer a partial framework for thinking about Service Level Objectives (SLOs), and discuss some unsolved challenges. View details
    Preview abstract We increasingly depend on the availability of online services, either directly as users, or indirectly, when cloud-provider services support directly-accessed services. The availability of these "visible services" depends in complex ways on the availability of a complex underlying set of invisible infrastructure services. In our experience, most software engineers lack useful frameworks to create and evaluate designs for individual services that support end-to-end availability in these infrastructures, especially given cost, performance, and other constraints on viable commercial services. Even given the extensive research literature on techniques for replicated state machines and other fault-tolerance mechanisms, we found little help in this literature for addressing infrastructure-wide availability. Past research has often focused on point solutions, rather than end-to-end ones. In particular, it seems quite difficult to define useful targets for infrastructure-level availability, and then to translate these to design requirements for individual services. We argue that, in many but not all ways, one can think about availability with the mindset that we have learned to use for security, and we discuss some general techniques that appear useful for implementing and operating high-availability infrastructures. We encourage a shift in emphasis for academic research into availability. View details
    Condor: Better Topologies through Declarative Design
    Brandon Schlinker
    Radhika Niranjan Mysore
    Sean Smith
    Amin Vahdat
    Minlan Yu
    Ethan Katz-Bassett
    Michael Rubin
    Sigcomm '15, Google Inc(2015)
    Preview abstract The design space for large, multipath datacenter networks is large and complex, and no one design fits all purposes. Network architects must trade off many criteria to design cost-effective, reliable, and maintainable networks, and typically cannot explore much of the design space. We present Condor, our approach to enabling a rapid, efficient design cycle. Condor allows architects to express their requirements as constraints via a Topology Description Language (TDL), rather than having to directly specify network structures. Condor then uses constraint-based synthesis to rapidly generate candidate topologies, which can be analyzed against multiple criteria. We show that TDL supports concise descriptions of topologies such as fat-trees, BCube, and DCell; that we can generate known and novel variants of fat-trees with simple changes to a TDL file; and that we can synthesize large topologies in tens of seconds. We also show that Condor supports the daunting task of designing multi-phase network expansions that can be carried out on live networks. View details
    Inferring the Network Latency Requirements of Cloud Tenants}
    Ramana Rao Kompella
    15th Workshop on Hot Topics in Operating Systems (HotOS XV), USENIX Association(2015)
    Preview abstract Cloud IaaS and PaaS tenants rely on cloud providers to provide network infrastructures that make the appropriate tradeoff between cost and performance. This can include mechanisms to help customers understand the performance requirements of their applications. Previous research (e.g., Proteus and Cicada) has shown how to do this for network-bandwidth demands, but cloud tenants may also need to meet latency objectives, which in turn may depend on reliable limits on network latency, and its variance, within the cloud providers infrastructure. On the other hand, if network latency is sufficient for an application, further decreases in latency might add cost without any benefit. Therefore, both tenant and provider have an interest in knowing what network latency is good enough for a given application. This paper explores several options for a cloud provider to infer a tenants network-latency demands, with varying tradeoffs between requirements for tenant participation, accuracy of inference, and instrumentation overhead. In particular, we explore the feasibility of a hypervisor-only mechanism, which would work without any modifications to tenant code, even in IaaS clouds. View details
    Flexible Network Bandwidth and Latency Provisioning in the Datacenter
    Vimalkumar Jeyakumar
    Abdul Kabbani
    Amin Vahdat
    arxiv.org(2014)
    Preview abstract Predictably sharing the network is critical to achieving high utilization in the datacenter. Past work has focussed on providing bandwidth to endpoints, but often we want to allocate resources among multi-node services. In this paper, we present Parley, which provides service-centric minimum bandwidth guarantees, which can be composed hierarchically. Parley also supports service-centric weighted sharing of bandwidth in excess of these guarantees. Further, we show how to configure these policies so services can get low latencies even at high network load. We evaluate Parley on a multi-tiered oversubscribed network connecting 90 machines, each with a 10Gb/s network interface, and demonstrate that Parley is able to meet its goals. View details
    Enforcing Network-Wide Policies in the Presence of Dynamic Middlebox Actions using FlowTags
    Seyed Kaveh Fayazbakhsh
    Luis Chang
    Vyas Sekar
    Minlan Yu
    Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’14), USENIX Association(2014), pp. 533-546
    Preview abstract Middleboxes provide key security and performance guarantees in networks. Unfortunately, the dynamic traffic modifications they induce make it difficult to reason about network management tasks such as access control, accounting, and diagnostics. This also makes it difficult to integrate middleboxes into SDN-capable networks and leverage the benefits that SDN can offer. In response, we develop the FlowTags architecture. FlowTags-enhanced middleboxes export tags to provide the necessary causal context (e.g., source hosts or internal cache/miss state). SDN controllers can configure the tag generation and tag consumption operations using new FlowTags APIs. These operations help restore two key SDN tenets: (i) bindings between packets and their “origins,” and (ii) ensuring that packets follow policymandated paths. We develop new controller mechanisms that leverage FlowTags. We show the feasibility of minimally extending middleboxes to support FlowTags. We also show that FlowTags imposes low overhead over traditional SDN mechanisms. Finally, we demonstrate the early promise of FlowTags in enabling new verification and diagnosis capabilities. View details
    Democratic Resolution of Resource Conflicts Between SDN Control Programs
    Alvin AuYoung
    Yadi Ma
    Sujata Banerjee
    Jeongkeun Lee
    Puneet Sharma
    Yoshio Turner
    Chen Liang
    CoNEXT '14 Proceedings of the 10th ACM International on Conference on emerging Networking Experiments and Technologies, ACM(2014), pp. 391-402
    Preview abstract Resource conflicts are inevitable on any shared infrastructure. In Software-Defined Networks (SDNs), different controller modules with diverse objectives may be installed on the SDN controller. Each module independently generates resource requests that may conflict with the objectives of a different module. For example, a controller module for maintaining high availability may want resource allocations that require too much core network bandwidth and thus conflict with another module that aims to minimize core bandwidth usage. In such a situation, it is imperative to identify and install resource allocations that achieve network wide global objectives that may not be known to individual modules, e.g., high availability with acceptable bandwidth usage. This problem has received only limited attention, with most prior work focused on detecting, avoiding, and resolving rule-level conflicts in the context of OpenFlow. In this paper, we present an automatic resolution mechanism based on a family of voting procedures, and apply it to resolve resource conflicts among SDN and cloud controller programs. We observe that the choice of appropriate resolution mechanism depends on two properties of the deployed modules: their precision and parity. Based on these properties, a network operator can apply a range of resolution techniques. We present two such techniques. Overall, our system promotes modularity and does not require each controller module to divulge its objectives or algorithms to other modules. We demonstrate the improvement in allocation quality over various alternative resolution methods, such as static priorities or equal weight, round-robin decisions. Finally, we provide a qualitative comparison of this work to recent methods based on utility or currency. View details
    Cicada: Introducing Predictive Guarantees for Cloud Networks
    Katrina LaCurts
    Hari Balakrishnan
    Yoshio Turner
    6th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 14), USENIX Association(2014)
    Preview abstract In cloud-computing systems, network-bandwidth guarantees have been shown to improve predictability of application performance and cost. Most previous work on cloud-bandwidth guarantees has assumed that cloud tenants know what bandwidth guarantees they want. However, as we show in this work, application bandwidth demands can be complex and time-varying, and many tenants might lack sufficient information to request a guarantee that is well-matched to their needs, which can lead to over-provisioning (and thus reduced cost-efficiency) or under-provisioning (and thus poor user experience). We analyze traffic traces gathered over six months from an HP Cloud Services datacenter, finding that application bandwidth consumption is both time-varying and spatially inhomogeneous. This variability makes it hard to predict requirements. To solve this problem, we develop a prediction algorithm usable by a cloud provider to suggest an appropriate bandwidth guarantee to a tenant. With tenant VM placement using these predictive guarantees, we find that the inter-rack network utilization in certain datacenter topologies can be more than doubled. View details
    FlowTags: Enforcing Network-Wide Policies in the Presence of Dynamic Middlebox Actions
    Seyed Kaveh Fayazbakhsh
    Vyas Sekar
    Minlan Yu
    Proc. ACM SIGCOMM Workshop on Hot Topics in Software Defined Networking (HotSDN), ACM(2013)
    Preview abstract Past studies show that middleboxes are a critical piece of network infrastructure for providing security and performance guarantees. Unfortunately, the dynamic and traffic-dependent modifications induced by middleboxes make it difficult to reason about the correctness of network-wide policy enforcement (e.g., access control, accounting, and performance diagnostics). Using practical application scenarios, we argue that we need a flow tracking capability to ensure consistent policy enforcement in the presence of such dynamic traffic modifications. To this end, we propose FlowTags, an extended SDN architecture in which middleboxes add Tags to outgoing packets, to provide the necessary causal context (e.g., source hosts or internal cache/miss state). These Tags are used on switches and (other) middleboxes for systematic policy enforcement. We discuss the early promise of minimally extending middleboxes to provide this support. We also highlight open challenges in the design of southbound and northbound FlowTags APIs; new controllayer applications for enforcing and verifying policies; and automatically modifying legacy middleboxes to support FlowTags. View details
    Corybantic: towards the modular composition of SDN control programs
    Alvin AuYoung
    Sujata Banerjee
    Lucian Popa
    Jeongkeun Lee
    Jayaram Mudigonda
    Puneet Sharma
    Yoshio Turner
    Proceedings of the Twelfth ACM Workshop on Hot Topics in Networks (HotNets-XII), ACM(2013)
    Preview abstract Software-Defined Networking (SDN) promises to enable vigorous innovation, through separation of the control plane from the data plane, and to enable novel forms of network management, through a controller that uses a global view to make globally-valid decisions. The design of SDN controllers creates novel challenges; much previous work has focused on making them scalable, reliable, and efficient. However, prior work has ignored the problem that multiple controller functions may be competing for resources (e.g., link bandwidth or switch table slots). Our Corybantic design supports modular composition of independent controller modules, which manage different aspects of the network while competing for resources. Each module tries to optimize one or more objective functions; we address the challenge of how to coordinate between these modules to maximize the overall value delivered by the controllers' decisions, while still achieving modularity. View details
    ElasticSwitch: practical work-conserving bandwidth guarantees for cloud computing
    Lucian Popa
    Praveen Yalagandula
    Sujata Banerjee
    Yoshio Turner
    Jose Renato Santos
    Proceedings of the ACM SIGCOMM 2013 conference, ACM, pp. 351-362
    Preview abstract While cloud computing providers offer guaranteed allocations for resources such as CPU and memory, they do not offer any guarantees for network resources. The lack of network guarantees prevents tenants from predicting lower bounds on the performance of their applications. The research community has recognized this limitation but, unfortunately, prior solutions have significant limitations: either they are inefficient, because they are not work-conserving, or they are impractical, because they require expensive switch support or congestion-free network cores. In this paper, we propose ElasticSwitch, an efficient and practical approach for providing bandwidth guarantees. ElasticSwitch is efficient because it utilizes the spare bandwidth from unreserved capacity or underutilized reservations. ElasticSwitch is practical because it can be fully implemented in hypervisors, without requiring a specific topology or any support from switches. Because hypervisors operate mostly independently, there is no need for complex coordination between them or with a central controller. Our experiments, with a prototype implementation on a 100-server testbed, demonstrate that ElasticSwitch provides bandwidth guarantees and is work-conserving, even in challenging situations. View details
    The NIC Is the Hypervisor: Bare-Metal Guests in IaaS Clouds
    Jayaram Mudigonda
    Jose Renato Santos
    Yoshio Turner
    14th Workshop on Hot Topics in Operating Systems (HotOS-XiV), USENIX Association(2013)
    Preview abstract Cloud computing does not inherently require the use of virtual machines, and some cloud customers prefer or even require “bare metal” systems, where no hypervisor separates the guest operating system from the CPU. Even for bare-metal nodes, the cloud provider must find a means to isolate the guest system from other cloud resources, and to manage the instantiation and removal of guests. We argue that an enhanced NIC, together with standard features of modern servers, can provide all of the functions for which a hypervisor would normally be required. View details
    What we talk about when we talk about cloud network performance
    Lucian Popa
    Computer Communication Review, 42(2012), pp. 44-48
    Preview abstract Infrastructure-as-a-Service ("Cloud") data-centers intrinsically depend on high-performance networks to connect servers within the data-center and to the rest of the world. Cloud providers typically offer different service levels, and associated prices, for different sizes of virtual machine, memory, and disk storage. However, while all cloud providers provide network connectivity to tenant VMs, they seldom make any promises about network performance, and so cloud tenants suffer from highly-variable, unpredictable network performance. Many cloud customers do want to be able to rely on network performance guarantees, and many cloud providers would like to offer (and charge for) these guarantees. But nobody really agrees on how to define these guarantees, and it turns out to be challenging to define "network performance" in a way that is useful to both customers and providers. We attempt to bring some clarity to this question. View details
    On the Security of Conference and Journal Submission Sites
    Eddie Kohler
    TinyToCS, 1(2012)
    Preview abstract It is well known that many, if not most, people re-use the same password on multiple Web sites [3], even though this practice has been frequently criticized by privacy and security experts. Therefore, Web applications that allow users to choose their own passwords should, at the very least, protect these passwords in transit using SSL [2]. HotCRP is one such Web application. HotCRP is now widely used by computer systems conferences and journals; for example, Tiny ToCS. View details
    TweeCards: Tweets Go Postal
    Mary Baker
    Ian Robinson
    TinyToCS, 1(2012)
    Preview abstract The US Postal Service is running a large deficit due to dropping demand for first-class mail services; Twitter is a popular social networking site with no current way to monetize fully its user-generated content; and the computer industry always needs new demand for its storage, networking, and imaging products. Prior work has ignored the possibility of solving all of these problems with one mechanism; we see these problems as creating a holistic challenge. Social networking, especially when the application is aimed at enticing teenagers to spend their parents’ money, creates privacy challenges. In particular, the real names and addresses of Twitter users should not be exposed to the people they follow. Through the application of on-demand printing technology, a widely-deployed content delivery network [2], QR codes for embedding machine-readable references to URLs, cloud computing, and privacy-preservation software based on the universally applicable DHT mechanism, we see a new opportunity to combine the burgeoning field of social networking with the time-honored thrill of receiving post cards. Prior approaches (e.g., Apple iCards and get@#%&&ter.com) provide much less dynamic solutions to the problem, and, besides, they fail to meet the bromidic test of using a DHT. View details
    Report on the SIGCOMM 2011 conference
    John W. Byers
    Fadel Adib
    Jay Aikat
    Danai Chasaki
    Ming-Hung Chen
    Marshini Chetty
    Romain Fontugne
    Vijay Gabale
    László Gyarmati
    Katrina LaCurts
    Qi Liao
    Marc Mendonca
    Trang Cao Minh
    S. H. Shah Newaz
    Pawan Prakash
    Yan Shvartzshnaider
    Praveen Yalagandula
    Chun-Yu Yang
    Computer Communication Review, 42(2012), pp. 80-96
    Preview abstract This document provides reports on the presentations at the SIGCOMM 2011 Conference, the annual conference of the ACM Special Interest Group on Data Communication (SIGCOMM). View details
    NetLord: a scalable multi-tenant network architecture for virtualized datacenters
    Jayaram Mudigonda
    Praveen Yalagandula
    Bryan Stiekes
    Yanick Pouffary
    SIGCOMM(2011), pp. 62-73
    Preview abstract Providers of “Infrastructure-as-a-Service” need datacenter networks that support multi-tenancy, scale, and ease of operation, at low cost. Most existing network architectures cannot meet all of these needs simultaneously. In this paper we present NetLord, a novel multi-tenant network architecture. NetLord provides tenants with simple and flexible network abstractions, by fully and efficiently virtualizing the address space at both L2 and L3. NetLord can exploit inexpensive commodity equipment to scale the network to several thousands of tenants and millions of virtual machines. NetLord requires only a small amount of offline, one-time configuration. We implemented NetLord on a testbed, and demonstrated its scalability, while achieving order-of-magnitude goodput improvements over previous approaches. View details
    DevoFlow: scaling flow management for high-performance networks
    Andrew R. Curtis
    Jean Tourrilhes
    Praveen Yalagandula
    Puneet Sharma
    Sujata Banerjee
    SIGCOMM(2011), pp. 254-265
    Preview abstract OpenFlow is a great concept, but its original design imposes excessive overheads. It can simplify network and traffic management in enterprise and data center environments, because it enables flow-level control over Ethernet switching and provides global visibility of the flows in the network. However, such fine-grained control and visibility comes with costs: the switch-implementation costs of involving the switch's control-plane too often and the distributed-system costs of involving the OpenFlow controller too frequently, both on flow setups and especially for statistics-gathering. In this paper, we analyze these overheads, and show that OpenFlow's current design cannot meet the needs of high-performance networks. We design and evaluate DevoFlow, a modification of the OpenFlow model which gently breaks the coupling between control and global visibility, in a way that maintains a useful amount of visibility without imposing unnecessary costs. We evaluate DevoFlow through simulations, and find that it can load-balance data center traffic as well as fine-grained solutions, without as much overhead: DevoFlow uses 10--53 times fewer flow table entries at an average switch, and uses 10--42 times fewer control messages. View details
    DevoFlow: cost-effective flow management for high performance enterprise networks
    Jean Tourrilhes
    Praveen Yalagandula
    Puneet Sharma
    Andrew R. Curtis
    Sujata Banerjee
    HotNets(2010), pp. 1
    Preview abstract The OpenFlow framework enables flow-level control over Ethernet switching, as well as centralized visibility of the flows in the network. OpenFlow's coupling of these features comes with costs, however: the distributed-system costs of involving the OpenFlow controller on flow setups, and the switch-implementation costs of involving the switch's control plane too often. In this paper, we analyze the overheads, and we propose DevoFlow, a modification of the OpenFlow model in which we try to gently break the coupling between centralized control and centralized visibility, in a way that maintains a useful amount of visibility without imposing unnecessary costs. View details
    Report on WREN 2009 -- workshop: research on enterprise networking
    Nathan Farrington
    Nikhil Handigol
    Christoph Mayer
    Kok-Kiong Yap
    Computer Communication Review, 40(2010), pp. 44-49
    Preview abstract WREN 2009, the Workshop on Research on Enterprise Networking, was held on August 21, 2009, in conjunction with SIGCOMM 2009 in Barcelona. WREN focussed on research challenges and results specific to enterprise and data-center networks. Details about the workshop, including the organizers and the papers presented, are at http://conferences.sigcomm.org/sigcomm/2009/workshops/wren/index.php. Approximately 48 people registered to attend WREN. The workshop was structured to encourage a lot of questions and discussion. To record what was said, four volunteer scribes (Nathan Farrington, Nikhil Handigol, Christoph Mayer, and Kok-Kiong Yap) took notes. This report is a merged and edited version of their notes. Please realize that the result, while presented in the form of quotations, is at best a paraphrasing of what was actually said, and in some cases may be mistaken. Also, some quotes might be mis-attributed, and some discussion has been lost, due to the interactive nature of the workshop. The second instance of WREN will be combined with the Internet Network Management Workshop (INM), in conjunction with NSDI 2010; see http://www.usenix.org/event/inmwren10/cfp/ for deadlines and additional information. Also note that two papers from WREN were re-published in the January 2010 issue of Computer Communication Review: “Understanding Data Center Traffic Characteristics,” by Theophilus A Benson, Ashok Anand, Aditya Akella, and Ming Zhang, and “Remote Network Labs: An On-Demand Network Cloud for Configuration Testing,” by Huan Liu and Dan Orban. View details
    SPAIN: COTS Data-Center Ethernet for Multipathing over Arbitrary Topologies
    Jayaram Mudigonda
    Praveen Yalagandula
    Mohammad Al-Fares
    NSDI(2010), pp. 265-280
    Preview abstract Operators of data centers want a scalable network fabric that supports high bisection bandwidth and host mobility, but which costs very little to purchase and administer. Ethernet almost solves the problem – it is cheap and supports high link bandwidths – but traditional Ethernet does not scale, because its spanning-tree topology forces traffic onto a single tree. Many researchers have described “scalable Ethernet” designs to solve the scaling problem, by enabling the use of multiple paths through the network. However, most such designs require specific wiring topologies, which can create deployment problems, or changes to the network switches, which could obviate the commodity pricing of these parts. In this paper, we describe SPAIN (“Smart Path Assignment In Networks”). SPAIN provides multipath forwarding using inexpensive, commodity off-the-shelf (COTS) Ethernet switches, over arbitrary topologies. SPAIN precomputes a set of paths that exploit the redundancy in a given network topology, then merges these paths into a set of trees; each tree is mapped as a separate VLAN onto the physical Ethernet. SPAIN requires only minor end-host software modifications, including a simple algorithm that chooses between pre-installed paths to efficiently spread load over the network. We demonstrate SPAIN’s ability to improve bisection bandwidth over both simulated and experimental data-center networks. View details
    Chimpp: a click-based programming and simulation environment for reconfigurable networking hardware
    Erik Rubow
    Rick McGeer
    Amin Vahdat
    ANCS(2010), pp. 36
    Preview abstract Reconfigurable network hardware makes it easier to experiment with and prototype high-speed networking systems. However, these devices are still relatively hard to program; for example, requiring users to develop in Verilog or VHDL. Further, these devices are commonly designed to work with software on a host computer, requiring the co-development of these hardware and software components. We address this situation with Chimpp, a development environment for reconfigurable network hardware, modeled on the popular Click modular router system. Chimpp employs a modular approach to designing hardware-based packet-processing systems, featuring a simple configuration language similar to that of Click. We demonstrate this development environment by targeting the NetF-PGA platform. Chimpp can be combined with Click itself at the software layer for a highly modular, mixed hardware and software design framework. We also enable the integrated simulation of the hardware and software components of a network device together with other network devices using the OMNeT++ network simulator. The goal of Chimpp is to make experimentation easy by providing a toolbox of reusable, modular elements and a way to easily combine them. In contrast with some prior work, Chimpp avoids unnecessary restrictions on module interfaces and design styles. Rather, it is easy to add custom interfaces and to incorporate existing hardware modules. We describe our design and implementation of Chimpp, and provide initial evaluations showing how Chimpp makes it easier to implement, simulate, and modify a variety of packet-processing systems on the NetFPGA platform. View details
    Fast switching of threads between cores
    Richard D. Strong
    Jayaram Mudigonda
    Nathan L. Binkert
    Dean M. Tullsen
    Operating Systems Review, 43(2009), pp. 35-45
    Preview abstract We address the software costs of switching threads between cores in a multicore processor. Fast core switching enables a variety of potential improvements, such as thread migration for thermal management, fine-grained load balancing, and exploiting asymmetric multicores, where performance asymmetry creates opportunities for more efficient resource utilization. Successful exploitation of these opportunities demands low core-switching costs. We describe our implementation of core switching in the Linux kernel, as well as software changes that can decrease switching costs. We use detailed simulations to evaluate several alternative implementations. We also explore how some simple architectural variations can reduce switching costs. We evaluate system efficiency using both real (but symmetric) hardware, and simulated asymmetric hardware, using both microbenchmarks and realistic applications. View details
    WOWCS: the workshop on organizing workshops, conferences, and symposia for computer systems
    Operating Systems Review, 43(2009), pp. 106-107
    Computer systems research at HP labs
    Jay J. Wylie
    Operating Systems Review, 43(2009), pp. 8-9
    Operating System Support for NVM+DRAM Hybrid Main Memory
    Eduardo Argollo
    Mehul A. Shah
    Paolo Faraboschi
    HotOS(2009)
    Preview abstract Technology trends may soon favor building main memory as a hybrid between DRAM and non-volatile memory, such as flash or PC-RAM. We describe how the operating system might manage such hybrid memories, using semantic information not available in other layers. We describe preliminary experiments suggesting that this approach is viable. View details
    Open issues in organizing computer systems conferences
    Tom Anderson
    Computer Communication Review, 38(2008), pp. 93-102
    Preview abstract The Workshop on Organizing Workshops, Conferences, and Symposia for Computer Systems (WOWCS) was organized to “bring together conference organizers (past, present, and future) and other interested people to discuss the issues they confront.” In conjunction with WOWCS, we survey some previous publications that discuss open issues related to organizing computer systems conferences, especially concerning conduct and management of the review process. We also list some topics about which we wish WOWCS had received submissions, but did not; these could be good topics for future articles. View details
    Before and After WOWCS: A literature survey, A list of papers we wish had been submitted
    Tom Anderson
    WOWCS(2008)
    Preview abstract The Workshop on Organizing Workshops, Conferences, and Symposia for Computer Systems (WOWCS) was or- ganized to “bring together conference organizers (past, present, and future) and other interested people to dis- cuss the issues they confront.” In addition to the po- sition papers submitted to the workshop, the WOWCS program committee has collected a bibliography of pre- vious publications in this area. We also list some topics about which we wish we had received submissions, but did not; these could be good topics for future articles. View details
    Looking Between the Street Lamps
    HotPower(2008)
    Using Asymmetric Single-ISA CMPs to Save Energy on Operating Systems
    Jayaram Mudigonda
    Nathan L. Binkert
    Parthasarathy Ranganathan
    Vanish Talwar
    IEEE Micro, 28(2008), pp. 26-41
    Preview abstract CPUs consume too much power. Modern complex cores sometimes waste power on functions that are not useful for the code they run. In particular, operating system kernels do not benefit from many power-consuming features intended to improve application performance. We advocate asymmetric single-ISA multicore systems, in which some cores are optimized to run OS code at greatly improved energy efficiency. View details
    Auditing to Keep Online Storage Services Honest
    Mehul A. Shah
    Mary Baker
    Ram Swaminathan
    HotOS(2007)
    Preview abstract A growing number of online service providers offer to store customers' photos, email, file system backups, and other digital assets. Currently, customers cannot make informed decisions about the risk of losing data stored with any particular service provider, reducing their incentive to rely on these services. We argue that third- party auditing is important in creating an online service- oriented economy, because it allows customers to evaluate risks, and it increases the efciency of insurance- based risk mitigation. We describe approaches and system hooks that support both internal and external auditing of online storage services, describe motivations for service providers and auditors to adopt these approaches, and list challenges that need to be resolved for such auditing to become a reality. View details
    Pip: Detecting the Unexpected in Distributed Systems
    Patrick Reynolds
    Janet L. Wiener
    Mehul A. Shah
    Amin Vahdat
    NSDI(2006)
    WAP5: black-box performance debugging for wide-area systems
    Patrick Reynolds
    Janet L. Wiener
    Marcos Kawazoe Aguilera
    Amin Vahdat
    WWW(2006), pp. 347-356
    Emergent (mis)behavior vs. complex software systems
    EuroSys(2006), pp. 293-304
    SC2D: an alternative to trace anonymization
    Martin F. Arlitt
    MineNet(2006), pp. 323-328
    Operating Systems Should Support Business Change
    HotOS(2005)
    Predicting Short-Transfer Latency from TCP Arcana: A Trace-based Validation
    Martin F. Arlitt
    Balachander Krishnamurthy
    Internet Measurment Conference(2005), pp. 213-226
    {HTTP Header Field Registrations} (RFC4229)
    M. Nottingham
    IETF(2005)
    {Remote Direct Memory Access (RDMA) over IP Problem Statement} (RFC4297)
    A. Romanow
    T. Talpey
    S. Bailey
    IETF(2005)
    Clarifying the fundamentals of HTTP
    Softw., Pract. Exper., 34(2004), pp. 103-134
    2 P2P or Not 2 P2P?
    Mema Roussopoulos
    Mary Baker
    David S. H. Rosenthal
    Thomas J. Giuli
    IPTPS(2004), pp. 33-43
    2 P2P or Not 2 P2P?
    Mema Roussopoulos
    Mary Baker
    David S. H. Rosenthal
    Thomas J. Giuli
    IPTPS(2004), pp. 33-43
    {Registration Procedures for Message Header Fields} (RFC3864)
    G. Klyne
    M. Nottingham
    IETF(2004)
    Unveiling the transport
    Lawrence S. Brakmo
    David E. Lowell
    Dinesh Subhraveti
    Justin Moore
    Computer Communication Review, 34(2004), pp. 99-106
    Design, Implementation, and Evaluation of Duplicate Transfer Detection in HTTP
    Yee-Man Chan
    Terence Kelly
    NSDI(2004), pp. 43-56
    Utilification
    Jaap Suermondt
    ACM SIGOPS European Workshop(2004), pp. 13
    TCP Offload Is a Dumb Idea Whose Time Has Come
    HotOS(2003), pp. 25-30
    2 P2P or Not 2 P2P?
    Mema Roussopoulos
    Mary Baker
    David S. H. Rosenthal
    Thomas J. Giuli
    CoRR, cs.NI/0311017(2003)
    Architecture and performance of server-directed transcoding
    Björn Knutsson
    Honghui Lu
    Bryan Hopkins
    ACM Trans. Internet Techn., 3(2003), pp. 392-424
    2 P2P or Not 2 P2P?
    Mema Roussopoulos
    Mary Baker
    David S. H. Rosenthal
    Thomas J. Giuli
    CoRR, cs.NI/0311017(2003)
    Workshop on network-I/O convergence: experience, lessons, implications (NICELI)
    Vinay Aggarwal
    Olaf Maennel
    Allyn Romanow
    Computer Communication Review, 33(2003), pp. 75-80
    Performance debugging for distributed systems of black boxes
    Marcos Kawazoe Aguilera
    Janet L. Wiener
    Patrick Reynolds
    Athicha Muthitacharoen
    SOSP(2003), pp. 74-89
    {The VCDIFF Generic Differencing and Compression Data Format} (RFC3284)
    D. Korn
    J. MacDonald
    K. Vo
    IETF(2002)
    Clarifying the fundamentals of HTTP
    WWW(2002), pp. 25-36
    Aliasing on the world wide web: prevalence and performance implications
    Terence Kelly
    WWW(2002), pp. 281-292
    {Delta encoding in HTTP} (RFC3229)
    B. Krishnamurthy
    F. Douglis
    A. Feldmann
    Y. Goland
    A. van Hoff
    D. Hellerstein
    IETF(2002)
    {Instance Digests in HTTP} (RFC3230)
    A. Van Hoff
    IETF(2002)
    Server-directed transcoding
    Computer Communications, 24(2001), pp. 155-162
    Toward a Rigorous Data Type Model for HTTP
    HotOS(2001), pp. 176
    Rethinking the TCP Nagle algorithm
    Greg Minshall
    Computer Communication Review, 31(2001), pp. 6-20
    {Pulse-Per-Second API for UNIX-like Operating Systems, Version 1.0} (RFC2783)
    D. Mills
    J. Brittenson
    J. Stone
    U. Windl
    IETF(2000)
    Application performance pitfalls and TCP's Nagle algorithm
    Greg Minshall
    Yasushi Saito
    Ben Verghese
    SIGMETRICS Performance Evaluation Review, 27(2000), pp. 36-44
    Resource Containers: A New Facility for Resource Management in Server Systems
    Gaurav Banga
    Peter Druschel
    OSDI(1999), pp. 45-58
    Key Differences Between HTTP/1.0 and HTTP/1.1
    Balachander Krishnamurthy
    David M. Kristol
    Computer Networks, 31(1999), pp. 1737-1751
    Y10K and Beyond (RFC 2550)
    Steve Glassman
    Mark Manasse
    IETF(1999)
    Preview abstract As we approach the end of the millennium, much attention has been paid to the so-called "Y2K" problem. Nearly everyone now regrets the short-sightedness of the programmers of yore who wrote programs designed to fail in the year 2000. Unfortunately, the current fixes for Y2K lead inevitably to a crisis in the year 10,000 when the programs are again designed to fail. This specification provides a solution to the "Y10K" problem which has also been called the "YAK" problem (hex) and the "YXK" problem (Roman numerals). View details
    Brittle Metrics in Operating Systems Research
    Workshop on Hot Topics in Operating Systems(1999), pp. 90-95
    {Hypertext Transfer Protocol -- HTTP/1.1} (RFC2616)
    R. Fielding
    J. Gettys
    H. Frystyk
    L. Masinter
    P. Leach
    T. Berners-Lee
    IETF(1999)
    A Scalable and Explicit Event Delivery Mechanism for UNIX
    Gaurav Banga
    Peter Druschel
    USENIX Annual Technical Conference, General Track(1999), pp. 253-265
    Errata for 'Potential benefits of delta encoding and data compression for HTTP'
    Fred Douglis
    Anja Feldmann
    Balachander Krishnamurthy
    Computer Communication Review, 28(1998), pp. 51-55
    Better operating system features for faster network servers
    Gaurav Banga
    Peter Druschel
    SIGMETRICS Performance Evaluation Review, 26(1998), pp. 23-30
    Scalable kernel performance for Internet servers under realistic loads
    Gaurav Banga
    Proc. 1998 USENIX Annual Technical Conf, USENIX, pp. 1-12
    Preview abstract UNIX Internet servers with an event-driven architecture often perform poorly under real workloads, even if they perform well under laboratory benchmarking conditions. We investigated the poor performance of eventdriven servers. We found that the delays typical in widearea networks cause busy servers to manage a large number of simultaneous connections. We also observed that the selectsystem call implementation in most UNIX kernels scales poorly with the number of connections being managed by a process. The UNIX algorithm for allocating file descriptors also scales poorly. These algorithmic problems lead directly to the poor performance of eventdriven servers. We implemented scalable versions of the select system call and the descriptor allocation algorithm. This led to an improvement of up to 58% in Web proxy and Web server throughput, and dramatically improved the scalability of the system. View details
    Rate of Change and other Metrics: a Live Study of the World Wide Web
    Fred Douglis
    Anja Feldmann
    Balachander Krishnamurthy
    USENIX Symposium on Internet Technologies and Systems(1997)
    {Simple Hit-Metering and Usage-Limiting for HTTP} (RFC2227)
    P. Leach
    IETF(1997)
    Eliminating Receive Livelock in an Interrupt-Driven Kemel
    K. K. Ramakrishnan:
    ACM Trans. Comput. Syst., 15(1997), pp. 217-252
    {Use and Interpretation of HTTP Version Numbers} (RFC2145)
    R. Fielding
    J. Gettys
    H. Frystyk
    IETF(1997)
    Potential Benefits of Delta Encoding and Data Compression for HTTP
    Fred Douglis
    Anja Feldmann
    Balachander Krishnamurthy
    SIGCOMM(1997), pp. 181-194
    Exploring the Bounds of Web Latency Reduction from Caching and Prefetching
    Tom M. Kroeger
    Darrell D. E. Long
    USENIX Symposium on Internet Technologies and Systems(1997)
    {Hypertext Transfer Protocol -- HTTP/1.1} (RFC2068)
    R. Fielding
    J. Gettys
    H. Frystyk
    T. Berners-Lee
    IETF(1997)
    Eliminating Receive Livelock in an Interrupt-driven Kernel
    K. K. Ramakrishnan
    USENIX Annual Technical Conference(1996), pp. 99-112
    {Path MTU Discovery for IP version 6} (RFC1981)
    J. McCann
    S. Deering
    IETF(1996)
    Hinted caching in the Web
    ACM SIGOPS European Workshop(1996), pp. 103-108
    The Case for Persistent-Connection HTTP
    SIGCOMM(1995), pp. 299-313
    Performance Implications of Multiple Pointer Sizes
    Joel F. Bartlett
    Robert N. Mayo
    Amitabh Srivastava
    USENIX Winter(1995), pp. 187-200
    Improving HTTP Latency
    Venkata N. Padmanabhan
    Computer Networks and ISDN Systems, 28(1995), pp. 25-35
    Recovery in Spritely NFS
    Computing Systems, 7(1994), pp. 201-262
    A Better Update Policy
    USENIX Summer(1994), pp. 99-111
    Big Memories on the Desktop
    Workshop on Workstation Operating Systems(1993), pp. 110-115
    Observing TCP Dynamics in Real Networks
    SIGCOMM(1992), pp. 305-317
    Network Locality at the Scale of Processes
    ACM Trans. Comput. Syst., 10(1992), pp. 81-109
    The Effect of Context Switches on Cache Performance
    Anita Borg
    ASPLOS(1991), pp. 75-84
    Network Locality at the Scale of Processes
    SIGCOMM(1991), pp. 273-284
    Efficient Use of Workstations for Passive Monitoring of Local Area Networks
    SIGCOMM(1990), pp. 253-263
    {Path MTU discovery} (RFC1191)
    S.E. Deering
    IETF(1990)
    Spritely NFS: Experiments with Cache-Consistency Protocols
    V. Srinivasan
    SOSP(1989), pp. 45-57
    {IP MTU discovery options} (RFC1063)
    C.A. Kent
    C. Partridge
    K. McCloghrie
    IETF(1988)
    Measured capacity of an Ethernet: myths and reality
    David R. Boggs
    Christopher A. Kent
    SIGCOMM(1988), pp. 222-234
    Fragmentation considered harmful
    Christopher A. Kent
    Proc. SIGCOMM, ACM(1987), pp. 390-401
    The Packet Filter: An Efficient Mechanism for User-level Network Code
    Richard F. Rashid
    Michael J. Accetta
    SOSP(1987), pp. 39-51
    {Internet Standard Subnetting Procedure} (RFC950)
    J. Postel
    IETF(1985)
    {Internet subnets} (RFC917)
    IETF(1984)
    {Broadcasting Internet datagrams in the presence of subnets} (RFC922)
    IETF(1984)
    {Broadcasting Internet Datagrams} (RFC919)
    IETF(1984)
    Representing Information About Files
    ICDCS(1984), pp. 432-439
    {A Reverse Address Resolution Protocol} (RFC903)
    R. Finlayson
    T. Mann
    M. Theimer
    IETF(1984)