![Jeffrey C. Mogul](https://storage.googleapis.com/gweb-research2023-media/pubtools/87.png)
Jeffrey C. Mogul
Jeff Mogul works on fast, cheap, reliable, and flexible networking infrastructure for Google. Until 2013, he was Fellow at HP Labs, doing research primarily on computer networks and operating systems issues for enterprise and cloud computer systems; previously, he worked at the DEC/Compaq Western Research Lab. He received his PhD from Stanford in 1986, an MS from Stanford in 1980, and an SB from MIT in 1979. He is an ACM Fellow. Jeff is the author or co-author of several Internet Standards; he contributed extensively to the HTTP/1.1 specification. He was an associate editor of Internetworking: Research and Experience, and has been the chair or co-chair of a variety of conferences and workshops, including SIGCOMM, OSDI, NSDI, USENIX, HotOS, and ANCS.
You can find a mostly up-to-date CV at http://jmogul.com/mogulcv.pdf
Research Areas
Authored Publications
Google Publications
Other Publications
Sort By
Preview abstract
While many network research papers address issues of deployability, with
a few exceptions, this has been limited to protocol compatibility or
switch-resource constraints, such as flow table sizes.
We argue that good network designs must also consider the costs and
complexities of deploying the design within the constraints of the physical
environment in a datacenter: \emph{physical} deployability.
The traditional metrics of network ``goodness'' mostly do not account
for these costs and constraints, and this may partially explain why some
otherwise attractive designs have not been deployed in real-world datacenters.
View details
Change Management in Physical Network Lifecycle Automation
Mo Alfares
Virginia Beauregard
Kevin Grant
Angus Griffith
Jahangir Hasan
Chen Huang
Quan Leng
Jiayao Li
Alexander Lin
Zhoutao Liu
Ahmed Mansy
Bill Martinusen
Nikil Mehta
Andrew Narver
Anshul Nigham
Melanie Obenberger
Sean Smith
Kurt Steinkraus
Sheng Sun
Edward Thiele
Amin Vahdat
Proc. 2023 USENIX Annual Technical Conference (USENIX ATC 23)
Preview abstract
Automated management of a physical network's lifecycle is critical for large networks. At Google, we manage network design, construction, evolution, and management via multiple automated systems. In our experience, one of the primary challenges is to reliably and efficiently manage change in this domain -- additions of new hardware and connectivity, planning and sequencing of topology mutations, introduction of new architectures, new software systems and fixes to old ones, etc.
We especially have learned the importance of supporting multiple kinds of change in parallel without conflicts or mistakes (which cause outages) while also maintaining parallelism between different teams and between different processes. We now know that this requires automated support.
This paper describes some of our network lifecycle goals, the automation we have developed to meet those goals, and the change-management challenges we encountered. We then discuss in detail our approaches to several specific kinds of change
management:
(1) managing conflicts between multiple operations on the same network;
(2) managing conflicts between operations spanning the boundaries between networks;
(3) managing representational changes in the models that drive our automated systems.
These approaches combine both novel software systems and software-engineering practices.
While this paper reports on our experience with large-scale datacenter network infrastructures, we are also applying the same tools and practices in several adjacent domains, such as the management of WAN systems, of machines, and of datacenter physical designs. Our approaches are likely to be useful at smaller scales, too.
View details
Data-driven Networking Research: models for academic collaboration with Industry (a Google point of view)
Priya Mahadevan
Christophe Diot
Amin Vahdat
Computer Communication Review, 51:4(2021), pp. 47-49
Preview abstract
We (Google's networking teams) would like to increase our collaborations with academic researchers related to data-driven networking research.
There are some significant constraints on our ability to directly share data, and in case not everyone in the community understands these, this document provides a brief summary.
There are some models which can work (primarily, interns and visiting scientists).
We describe some specific areas where we would welcome proposals to work within those models
View details
Cores that don't count
Parthasarathy Ranganathan
Amin Vahdat
Proc. 18th Workshop on Hot Topics in Operating Systems (HotOS 2021)
Preview abstract
We are accustomed to thinking of computers as fail-stop, especially the cores that execute instructions, and most system software implicitly relies on that assumption. During most of the VLSI era, processors that passed manufacturing tests and were operated within specifications have insulated us from this fiction. As fabrication pushes towards smaller feature sizes and more elaborate computational structures, and as increasingly specialized instruction-silicon pairings are introduced to improve performance, we have observed ephemeral computational errors that were not detected during manufacturing tests. These defects cannot always be mitigated by techniques such as microcode updates, and may be correlated to specific components within the processor, allowing small code changes to effect large shifts in reliability. Worse, these failures are often "silent'': the only symptom is an erroneous computation.
We refer to a core that develops such behavior as "mercurial.'' Mercurial cores are extremely rare, but in a large fleet of servers we can observe the correlated disruption they cause, often enough to see them as a distinct problem -- one that will require collaboration between hardware designers, processor vendors, and systems software architects.
This paper is a call-to-action for a new focus in systems research; we speculate about several software-based approaches to mercurial cores, ranging from better detection and isolating mechanisms, to methods for tolerating the silent data corruption they cause.
Please watch our short video summarizing the paper.
View details
Preview abstract
To reduce cost, datacenter network operators are exploring blocking network designs. An example of such a design is a "spine-free" form of a Fat-Tree, in which pods directly connect to each other, rather than via spine blocks. To maintain application-perceived performance in the face of dynamic workloads, these new designs must be able to reconfigure routing and the inter-pod topology. Gemini is a system designed to achieve these goals on commodity hardware while reconfiguring the network infrequently, rendering these blocking designs practical enough for deployment in the near future.
The key to Gemini is the joint optimization of topology and routing, using as input a robust estimation of future traffic derived from multiple historical traffic matrices. Gemini “hedges” against unpredicted bursts, by spreading these bursts across multiple paths, to minimize packet loss in exchange for a small increase in path lengths. It incorporates a robust decision algorithm to determine when to reconfigure, and whether to use hedging.
Data from tens of production fabrics allows us to categorize these as either low- or high-volatility; these categories seem stable. For the former, Gemini finds topologies and
routing with near-optimal performance and cost. For the latter, Gemini’s use of multi-traffic-matrix optimization and hedging avoids the need for frequent topology reconfiguration, with only marginal increases in path length. As a result, Gemini can support existing workloads on these production fabrics using a spine-free topology that is half the cost of the existing topology on these fabrics.
View details
Experiences with Modeling Network Topologies at Multiple Levels of Abstraction
Martin Pool
Xiaoxue Zhao
17th Symposium on Networked Systems Design and Implementation (NSDI)(2020)
Preview abstract
Network management is becoming increasingly automated,
and automation depends on detailed, explicit representations
of data about both the state of a network, and about an operator’s intent for its networks. In particular, we must explicitly
represent the desired and actual topology of a network; almost all other network-management data either derives from
its topology, constrains how to use a topology, or associates
resources (e.g., addresses) with specific places in a topology.
We describe MALT, a Multi-Abstraction-Layer Topology
representation, which supports virtually all of our network
management phases: design, deployment, configuration, operation, measurement, and analysis. MALT provides interoperability across software systems, and its support for abstraction allows us to explicitly tie low-level network elements to high-level design intent. MALT supports a declarative style that simplifies what-if analysis and testbed support.
We also describe the software base that supports efficient use of MALT, as well as numerous, sometimes painful
lessons we have learned about curating the taxonomy for a
comprehensive, and evolving, representation for topology.
View details
Minimal Rewiring: Efficient Live Expansion for Clos Data Center Networks
Shizhen Zhao
Rui Wang
Junlan Zhou
Joon Ong
Amin Vahdat
Proc. 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2019), USENIX Association (to appear)
Preview abstract
Clos topologies have been widely adopted for large-scale data center networks (DCNs), but it has been difficult to support incremental expansions of Clos DCNs. Some prior work has assumed that it is impossible to design DCN topologies that are both well-structured (non-random) and incrementally expandable at arbitrary granularities.
We demonstrate that it is indeed possible to design such networks, and to expand them while they are carrying live traffic, without incurring packet loss. We use a layer of patch panels between blocks of switches in a Clos network, which makes physical rewiring feasible, and we describe how to use integer linear programming (ILP) to minimize the number of patch-panel connections that must be changed, which makes expansions faster and cheaper. We also describe a block-aggregation technique that makes our ILP approach scalable.
We tested our "minimal-rewiring" solver on two kinds of fine-grained expansions using 2250 synthetic DCN topologies, and found that the solver can handle 99% of these cases while changing under 25% of the connections. Compared to prior approaches, this solver (on average) reduces the number of "stages" per expansion by about 3.1X -- a significant improvement to our operational costs, and to our exposure (during expansions) to capacity-reducing faults.
View details
Preview abstract
Cloud customers want reliable, understandable promises from cloud providers that their applications will run reliably and with adequate performance, but today, providers offer only limited guarantees, which creates uncertainty for customers. Providers also must define internal metrics to allow them to operate their systems without violating customer promises or expectations. We explore why these guarantees are hard to define. We show that this problem shares some similarities with the challenges of applying statistics to make decisions based on sampled data. We also suggest that defining guarantees in terms of defense against threats, rather than guarantees for application-visible outcomes, can reduce the complexity of these problems. Overall, we offer a partial framework for thinking about Service Level Objectives (SLOs), and discuss some unsolved challenges.
View details
Preview abstract
We increasingly depend on the availability of online services, either directly as users, or indirectly, when cloud-provider services support directly-accessed services. The availability of these "visible services" depends in complex ways on the availability of a complex underlying set of invisible infrastructure services.
In our experience, most software engineers lack useful frameworks to create and evaluate designs for individual services that support end-to-end availability in these infrastructures, especially given cost, performance, and other constraints on viable commercial services.
Even given the extensive research literature on techniques for replicated state machines and other fault-tolerance mechanisms, we found little help in this literature for addressing infrastructure-wide availability. Past research has often focused on point solutions, rather than end-to-end ones. In particular, it seems quite difficult to define useful targets for infrastructure-level availability, and then to translate these to design requirements for individual services.
We argue that, in many but not all ways, one can think about availability with the mindset that we have learned to use for security, and we discuss some general techniques that appear useful for implementing and operating high-availability
infrastructures. We encourage a shift in emphasis for academic research into availability.
View details
Condor: Better Topologies through Declarative Design
Brandon Schlinker
Radhika Niranjan Mysore
Sean Smith
Amin Vahdat
Minlan Yu
Ethan Katz-Bassett
Michael Rubin
Sigcomm '15, Google Inc(2015)
Preview abstract
The design space for large, multipath datacenter networks is large and complex, and no one design fits all purposes. Network architects must trade off many criteria to design cost-effective, reliable, and maintainable networks, and typically cannot explore much of the design space. We present Condor, our approach to enabling a rapid, efficient design cycle. Condor allows architects to express their requirements as constraints via a Topology Description Language (TDL), rather than having to directly specify network structures. Condor then uses constraint-based synthesis to rapidly generate candidate topologies, which can be analyzed against multiple criteria. We show that TDL supports concise descriptions of topologies such as fat-trees, BCube, and DCell; that we can generate known and novel variants of fat-trees with simple changes to a TDL file; and that we can synthesize large topologies in tens of seconds. We also show that Condor supports the daunting task of designing multi-phase network expansions that can be carried out on live networks.
View details
Inferring the Network Latency Requirements of Cloud Tenants}
Ramana Rao Kompella
15th Workshop on Hot Topics in Operating Systems (HotOS XV), USENIX Association(2015)
Preview abstract
Cloud IaaS and PaaS tenants rely on cloud providers to provide network infrastructures that make the appropriate tradeoff between cost and performance. This can include mechanisms to help customers understand the performance requirements of their applications. Previous research (e.g., Proteus and Cicada) has shown how to do this for network-bandwidth demands, but cloud tenants may also need to meet latency objectives, which in turn may depend on reliable limits on network latency, and its variance, within the cloud providers infrastructure. On the other hand, if network latency is sufficient for an application, further decreases in latency might add cost without any benefit. Therefore, both tenant and provider have an interest in knowing what network latency is good enough for a given application.
This paper explores several options for a cloud provider to infer a tenants network-latency demands, with varying tradeoffs between requirements for tenant participation, accuracy of inference, and instrumentation overhead. In particular, we explore the feasibility of a hypervisor-only mechanism, which would work without any modifications to tenant code, even in IaaS clouds.
View details
Preview abstract
Predictably sharing the network is critical to achieving high utilization in the datacenter. Past work has focussed on providing bandwidth to endpoints, but often we want to allocate resources among multi-node services. In this paper, we present Parley, which provides service-centric minimum bandwidth guarantees, which can be composed hierarchically. Parley also supports service-centric weighted sharing of bandwidth in excess of these guarantees. Further, we show how to configure these policies so services can get low latencies even at high network load. We evaluate Parley on a multi-tiered oversubscribed network connecting 90 machines, each with a 10Gb/s network interface, and demonstrate that Parley is able to meet its goals.
View details
Enforcing Network-Wide Policies in the Presence of Dynamic Middlebox Actions using FlowTags
Seyed Kaveh Fayazbakhsh
Luis Chang
Vyas Sekar
Minlan Yu
Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’14), USENIX Association(2014), pp. 533-546
Preview abstract
Middleboxes provide key security and performance guarantees in networks. Unfortunately, the dynamic traffic modifications they induce make it difficult to reason about network management tasks such as access control, accounting, and diagnostics. This also makes it difficult to integrate middleboxes into SDN-capable networks and leverage the benefits that SDN can offer.
In response, we develop the FlowTags architecture. FlowTags-enhanced middleboxes export tags to provide the necessary causal context (e.g., source hosts or internal cache/miss state). SDN controllers can configure the tag generation and tag consumption operations using new FlowTags APIs. These operations help restore two key SDN tenets: (i) bindings between packets and their “origins,” and (ii) ensuring that packets follow policymandated paths.
We develop new controller mechanisms that leverage FlowTags. We show the feasibility of minimally extending middleboxes to support FlowTags. We also show that FlowTags imposes low overhead over traditional SDN mechanisms. Finally, we demonstrate the early promise of FlowTags in enabling new verification and diagnosis capabilities.
View details
Democratic Resolution of Resource Conflicts Between SDN Control Programs
Alvin AuYoung
Yadi Ma
Sujata Banerjee
Jeongkeun Lee
Puneet Sharma
Yoshio Turner
Chen Liang
CoNEXT '14 Proceedings of the 10th ACM International on Conference on emerging Networking Experiments and Technologies, ACM(2014), pp. 391-402
Preview abstract
Resource conflicts are inevitable on any shared infrastructure. In Software-Defined Networks (SDNs), different controller modules with diverse objectives may be installed on the SDN controller. Each module independently generates resource requests that may conflict with the objectives of a different module. For example, a controller module for maintaining high availability may want resource allocations that require too much core network bandwidth and thus conflict with another module that aims to minimize core bandwidth usage. In such a situation, it is imperative to identify and install resource allocations that achieve network wide global objectives that may not be known to individual modules, e.g., high availability with acceptable bandwidth usage. This problem has received only limited attention, with most prior work focused on detecting, avoiding, and resolving rule-level conflicts in the context of OpenFlow.
In this paper, we present an automatic resolution mechanism based on a family of voting procedures, and apply it to resolve resource conflicts among SDN and cloud controller programs. We observe that the choice of appropriate resolution mechanism depends on two properties of the deployed modules: their precision and parity. Based on these properties, a network operator can apply a range of resolution techniques. We present two such techniques.
Overall, our system promotes modularity and does not require each controller module to divulge its objectives or algorithms to other modules. We demonstrate the improvement in allocation quality over various alternative resolution methods, such as static priorities or equal weight, round-robin decisions. Finally, we provide a qualitative comparison of this work to recent methods based on utility or currency.
View details
Cicada: Introducing Predictive Guarantees for Cloud Networks
Katrina LaCurts
Hari Balakrishnan
Yoshio Turner
6th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 14), USENIX Association(2014)
Preview abstract
In cloud-computing systems, network-bandwidth guarantees have been shown to improve predictability of application performance and cost. Most previous work on cloud-bandwidth guarantees has assumed that cloud tenants know what bandwidth guarantees they want. However, as we show in this work, application bandwidth demands can be complex and time-varying, and many tenants might lack sufficient information to request a guarantee that is well-matched to their needs, which can lead to over-provisioning (and thus reduced cost-efficiency) or under-provisioning (and thus poor user experience).
We analyze traffic traces gathered over six months from an HP Cloud Services datacenter, finding that application bandwidth consumption is both time-varying and spatially inhomogeneous. This variability makes it hard to predict requirements. To solve this problem, we develop a prediction algorithm usable by a cloud provider to suggest an appropriate bandwidth guarantee to a tenant. With tenant VM placement using these predictive guarantees, we find that the inter-rack network utilization in certain datacenter topologies can be more than doubled.
View details
FlowTags: Enforcing Network-Wide Policies in the Presence of Dynamic Middlebox Actions
Seyed Kaveh Fayazbakhsh
Vyas Sekar
Minlan Yu
Proc. ACM SIGCOMM Workshop on Hot Topics in Software Defined Networking (HotSDN), ACM(2013)
Preview abstract
Past studies show that middleboxes are a critical piece of network
infrastructure for providing security and performance guarantees.
Unfortunately, the dynamic and traffic-dependent modifications induced by middleboxes make it difficult to reason about the correctness of network-wide policy enforcement (e.g., access control,
accounting, and performance diagnostics). Using practical application scenarios, we argue that we need a flow tracking capability
to ensure consistent policy enforcement in the presence of such dynamic traffic modifications. To this end, we propose FlowTags, an
extended SDN architecture in which middleboxes add Tags to outgoing packets, to provide the necessary causal context (e.g., source
hosts or internal cache/miss state). These Tags are used on switches
and (other) middleboxes for systematic policy enforcement. We
discuss the early promise of minimally extending middleboxes to
provide this support. We also highlight open challenges in the design of southbound and northbound FlowTags APIs; new controllayer applications for enforcing and verifying policies; and automatically modifying legacy middleboxes to support FlowTags.
View details
Corybantic: towards the modular composition of SDN control programs
Alvin AuYoung
Sujata Banerjee
Lucian Popa
Jeongkeun Lee
Jayaram Mudigonda
Puneet Sharma
Yoshio Turner
Proceedings of the Twelfth ACM Workshop on Hot Topics in Networks (HotNets-XII), ACM(2013)
Preview abstract
Software-Defined Networking (SDN) promises to enable vigorous innovation, through separation of the control plane from the data plane, and to enable novel forms of network management, through a controller that uses a global view to make globally-valid decisions. The design of SDN controllers creates novel challenges; much previous work has focused on making them scalable, reliable, and efficient.
However, prior work has ignored the problem that multiple controller functions may be competing for resources (e.g., link bandwidth or switch table slots). Our Corybantic design supports modular composition of independent controller modules, which manage different aspects of the network while competing for resources. Each module tries to optimize one or more objective functions; we address the challenge of how to coordinate between these modules to maximize the overall value delivered by the controllers' decisions, while still achieving modularity.
View details
ElasticSwitch: practical work-conserving bandwidth guarantees for cloud computing
Lucian Popa
Praveen Yalagandula
Sujata Banerjee
Yoshio Turner
Jose Renato Santos
Proceedings of the ACM SIGCOMM 2013 conference, ACM, pp. 351-362
Preview abstract
While cloud computing providers offer guaranteed allocations for resources such as CPU and memory, they do not offer any guarantees for network resources. The lack of network guarantees prevents tenants from predicting lower bounds on the performance of their applications. The research community has recognized this limitation but, unfortunately, prior solutions have significant limitations: either they are inefficient, because they are not work-conserving, or they are impractical, because they require expensive switch support or congestion-free network cores.
In this paper, we propose ElasticSwitch, an efficient and practical approach for providing bandwidth guarantees. ElasticSwitch is efficient because it utilizes the spare bandwidth from unreserved capacity or underutilized reservations. ElasticSwitch is practical because it can be fully implemented in hypervisors, without requiring a specific topology or any support from switches. Because hypervisors operate mostly independently, there is no need for complex coordination between them or with a central controller. Our experiments, with a prototype implementation on a 100-server testbed, demonstrate that ElasticSwitch provides bandwidth guarantees and is work-conserving, even in challenging situations.
View details
The NIC Is the Hypervisor: Bare-Metal Guests in IaaS Clouds
Jayaram Mudigonda
Jose Renato Santos
Yoshio Turner
14th Workshop on Hot Topics in Operating Systems (HotOS-XiV), USENIX Association(2013)
Preview abstract
Cloud computing does not inherently require the use of virtual machines, and some cloud customers prefer or even require “bare metal” systems, where no hypervisor separates the guest operating system from the CPU. Even for bare-metal nodes, the cloud provider must find a means to isolate the guest system from other cloud resources, and to manage the instantiation and removal of guests. We argue that an enhanced NIC, together with standard features of modern servers, can provide all of the functions for which a hypervisor would normally be required.
View details
What we talk about when we talk about cloud network performance
Preview abstract
Infrastructure-as-a-Service ("Cloud") data-centers intrinsically depend on high-performance networks to connect servers within the data-center and to the rest of the world. Cloud providers typically offer different service levels, and associated prices, for different sizes of virtual machine, memory, and disk storage. However, while all cloud providers provide network connectivity to tenant VMs, they seldom make any promises about network performance, and so cloud tenants suffer from highly-variable, unpredictable network performance. Many cloud customers do want to be able to rely on network performance guarantees, and many cloud providers would like to offer (and charge for) these guarantees. But nobody really agrees on how to define these guarantees, and it turns out to be challenging to define "network performance" in a way that is useful to both customers and providers. We attempt to bring some clarity to this question.
View details
On the Security of Conference and Journal Submission Sites
Preview abstract
It is well known that many, if not most, people re-use the same password on multiple Web sites [3],
even though this practice has been frequently criticized by privacy and security experts. Therefore,
Web applications that allow users to choose their own passwords should, at the very least, protect
these passwords in transit using SSL [2].
HotCRP is one such Web application. HotCRP is now widely used by computer systems conferences and journals; for example, Tiny ToCS.
View details
TweeCards: Tweets Go Postal
Preview abstract
The US Postal Service is running a large deficit due to dropping demand for first-class mail services;
Twitter is a popular social networking site with no current way to monetize fully its user-generated
content; and the computer industry always needs new demand for its storage, networking, and
imaging products. Prior work has ignored the possibility of solving all of these problems with one
mechanism; we see these problems as creating a holistic challenge.
Social networking, especially when the application is aimed at enticing teenagers to spend their
parents’ money, creates privacy challenges. In particular, the real names and addresses of Twitter
users should not be exposed to the people they follow.
Through the application of on-demand printing technology, a widely-deployed content delivery
network [2], QR codes for embedding machine-readable references to URLs, cloud computing, and
privacy-preservation software based on the universally applicable DHT mechanism, we see a new
opportunity to combine the burgeoning field of social networking with the time-honored thrill of
receiving post cards.
Prior approaches (e.g., Apple iCards and get@#%&&ter.com) provide much less dynamic solutions to the problem, and, besides, they fail to meet the bromidic test of using a DHT.
View details
Report on the SIGCOMM 2011 conference
John W. Byers
Fadel Adib
Jay Aikat
Danai Chasaki
Ming-Hung Chen
Marshini Chetty
Romain Fontugne
Vijay Gabale
László Gyarmati
Katrina LaCurts
Qi Liao
Marc Mendonca
Trang Cao Minh
S. H. Shah Newaz
Pawan Prakash
Yan Shvartzshnaider
Praveen Yalagandula
Chun-Yu Yang
Computer Communication Review, 42(2012), pp. 80-96
Preview abstract
This document provides reports on the presentations at the SIGCOMM 2011 Conference, the annual conference of the ACM Special Interest Group on Data Communication (SIGCOMM).
View details
NetLord: a scalable multi-tenant network architecture for virtualized datacenters
Jayaram Mudigonda
Praveen Yalagandula
Bryan Stiekes
Yanick Pouffary
SIGCOMM(2011), pp. 62-73
Preview abstract
Providers of “Infrastructure-as-a-Service” need datacenter networks that support multi-tenancy, scale, and ease of operation, at
low cost. Most existing network architectures cannot meet all of
these needs simultaneously.
In this paper we present NetLord, a novel multi-tenant network
architecture. NetLord provides tenants with simple and flexible
network abstractions, by fully and efficiently virtualizing the address space at both L2 and L3. NetLord can exploit inexpensive
commodity equipment to scale the network to several thousands
of tenants and millions of virtual machines. NetLord requires
only a small amount of offline, one-time configuration. We implemented NetLord on a testbed, and demonstrated its scalability,
while achieving order-of-magnitude goodput improvements over
previous approaches.
View details
DevoFlow: scaling flow management for high-performance networks
Andrew R. Curtis
Jean Tourrilhes
Praveen Yalagandula
Puneet Sharma
Sujata Banerjee
SIGCOMM(2011), pp. 254-265
Preview abstract
OpenFlow is a great concept, but its original design imposes excessive overheads. It can simplify network and traffic management in enterprise and data center environments, because it enables flow-level control over Ethernet switching and provides global visibility of the flows in the network. However, such fine-grained control and visibility comes with costs: the switch-implementation costs of involving the switch's control-plane too often and the distributed-system costs of involving the OpenFlow controller too frequently, both on flow setups and especially for statistics-gathering.
In this paper, we analyze these overheads, and show that OpenFlow's current design cannot meet the needs of high-performance networks. We design and evaluate DevoFlow, a modification of the OpenFlow model which gently breaks the coupling between control and global visibility, in a way that maintains a useful amount of visibility without imposing unnecessary costs. We evaluate DevoFlow through simulations, and find that it can load-balance data center traffic as well as fine-grained solutions, without as much overhead: DevoFlow uses 10--53 times fewer flow table entries at an average switch, and uses 10--42 times fewer control messages.
View details
DevoFlow: cost-effective flow management for high performance enterprise networks
Jean Tourrilhes
Praveen Yalagandula
Puneet Sharma
Andrew R. Curtis
Sujata Banerjee
HotNets(2010), pp. 1
Preview abstract
The OpenFlow framework enables flow-level control over Ethernet switching, as well as centralized visibility of the flows in the network. OpenFlow's coupling of these features comes with costs, however: the distributed-system costs of involving the OpenFlow controller on flow setups, and the switch-implementation costs of involving the switch's control plane too often.
In this paper, we analyze the overheads, and we propose DevoFlow, a modification of the OpenFlow model in which we try to gently break the coupling between centralized control and centralized visibility, in a way that maintains a useful amount of visibility without imposing unnecessary costs.
View details
Report on WREN 2009 -- workshop: research on enterprise networking
Nathan Farrington
Nikhil Handigol
Christoph Mayer
Kok-Kiong Yap
Computer Communication Review, 40(2010), pp. 44-49
Preview abstract
WREN 2009, the Workshop on Research on Enterprise Networking, was held on August 21, 2009, in conjunction with SIGCOMM 2009 in Barcelona. WREN focussed on research challenges and results specific to enterprise and data-center networks. Details about the workshop, including the organizers and the papers presented, are at http://conferences.sigcomm.org/sigcomm/2009/workshops/wren/index.php. Approximately 48 people registered to attend WREN.
The workshop was structured to encourage a lot of questions and discussion. To record what was said, four volunteer scribes (Nathan Farrington, Nikhil Handigol, Christoph Mayer, and Kok-Kiong Yap) took notes. This report is a merged and edited version of their notes. Please realize that the result, while presented in the form of quotations, is at best a paraphrasing of what was actually said, and in some cases may be mistaken. Also, some quotes might be mis-attributed, and some discussion has been lost, due to the interactive nature of the workshop.
The second instance of WREN will be combined with the Internet Network Management Workshop (INM), in conjunction with NSDI 2010; see http://www.usenix.org/event/inmwren10/cfp/ for deadlines and additional information.
Also note that two papers from WREN were re-published in the January 2010 issue of Computer Communication Review: “Understanding Data Center Traffic Characteristics,” by Theophilus A Benson, Ashok Anand, Aditya Akella, and Ming Zhang, and “Remote Network Labs: An On-Demand Network Cloud for Configuration Testing,” by Huan Liu and Dan Orban.
View details
SPAIN: COTS Data-Center Ethernet for Multipathing over Arbitrary Topologies
Preview abstract
Operators of data centers want a scalable network fabric that supports high bisection bandwidth and host mobility, but which costs very little to purchase and administer. Ethernet almost solves the problem – it is cheap and
supports high link bandwidths – but traditional Ethernet
does not scale, because its spanning-tree topology forces
traffic onto a single tree. Many researchers have described “scalable Ethernet” designs to solve the scaling
problem, by enabling the use of multiple paths through
the network. However, most such designs require specific wiring topologies, which can create deployment
problems, or changes to the network switches, which
could obviate the commodity pricing of these parts.
In this paper, we describe SPAIN (“Smart Path Assignment In Networks”). SPAIN provides multipath forwarding using inexpensive, commodity off-the-shelf (COTS)
Ethernet switches, over arbitrary topologies. SPAIN precomputes a set of paths that exploit the redundancy in a
given network topology, then merges these paths into a
set of trees; each tree is mapped as a separate VLAN
onto the physical Ethernet. SPAIN requires only minor end-host software modifications, including a simple algorithm that chooses between pre-installed paths
to efficiently spread load over the network. We demonstrate SPAIN’s ability to improve bisection bandwidth
over both simulated and experimental data-center networks.
View details
Chimpp: a click-based programming and simulation environment for reconfigurable networking hardware
Preview abstract
Reconfigurable network hardware makes it easier to experiment with and prototype high-speed networking systems. However, these devices are still relatively hard to program; for example, requiring users to develop in Verilog or VHDL. Further, these devices are commonly designed to work with software on a host computer, requiring the co-development of these hardware and software components.
We address this situation with Chimpp, a development environment for reconfigurable network hardware, modeled on the popular Click modular router system. Chimpp employs a modular approach to designing hardware-based packet-processing systems, featuring a simple configuration language similar to that of Click. We demonstrate this development environment by targeting the NetF-PGA platform. Chimpp can be combined with Click itself at the software layer for a highly modular, mixed hardware and software design framework. We also enable the integrated simulation of the hardware and software components of a network device together with other network devices using the OMNeT++ network simulator.
The goal of Chimpp is to make experimentation easy by providing a toolbox of reusable, modular elements and a way to easily combine them. In contrast with some prior work, Chimpp avoids unnecessary restrictions on module interfaces and design styles. Rather, it is easy to add custom interfaces and to incorporate existing hardware modules.
We describe our design and implementation of Chimpp, and provide initial evaluations showing how Chimpp makes it easier to implement, simulate, and modify a variety of packet-processing systems on the NetFPGA platform.
View details
Fast switching of threads between cores
Richard D. Strong
Jayaram Mudigonda
Nathan L. Binkert
Dean M. Tullsen
Operating Systems Review, 43(2009), pp. 35-45
Preview abstract
We address the software costs of switching threads between cores in a multicore processor. Fast core switching enables a variety of potential improvements, such as thread migration for thermal management, fine-grained load balancing, and exploiting asymmetric multicores, where performance asymmetry creates opportunities for more efficient resource utilization. Successful exploitation of these opportunities demands low core-switching costs. We describe our implementation of core switching in the Linux kernel, as well as software changes that can decrease switching costs. We use detailed simulations to evaluate several alternative implementations. We also explore how some simple architectural variations can reduce switching costs. We evaluate system efficiency using both real (but symmetric) hardware, and simulated asymmetric hardware, using both microbenchmarks and realistic applications.
View details
WOWCS: the workshop on organizing workshops, conferences, and symposia for computer systems
Operating Systems Review, 43(2009), pp. 106-107
Computer systems research at HP labs
Operating System Support for NVM+DRAM Hybrid Main Memory
Preview abstract
Technology trends may soon favor building main
memory as a hybrid between DRAM and non-volatile
memory, such as flash or PC-RAM. We describe how the
operating system might manage such hybrid memories,
using semantic information not available in other layers.
We describe preliminary experiments suggesting that this
approach is viable.
View details
Open issues in organizing computer systems conferences
Preview abstract
The Workshop on Organizing Workshops, Conferences, and Symposia for Computer Systems (WOWCS) was organized to “bring together conference organizers (past, present, and future) and other interested people to discuss the issues they confront.” In conjunction with WOWCS, we survey some previous publications that discuss open issues related to organizing computer systems conferences, especially concerning conduct and management of the review process. We also list some topics about which we wish WOWCS had received submissions, but did not; these could be good topics for future articles.
View details
Before and After WOWCS: A literature survey, A list of papers we wish had been submitted
Preview abstract
The Workshop on Organizing Workshops, Conferences,
and Symposia for Computer Systems (WOWCS) was or-
ganized to bring together conference organizers (past,
present, and future) and other interested people to dis-
cuss the issues they confront. In addition to the po-
sition papers submitted to the workshop, the WOWCS
program committee has collected a bibliography of pre-
vious publications in this area. We also list some topics
about which we wish we had received submissions, but
did not; these could be good topics for future articles.
View details
Looking Between the Street Lamps
HotPower(2008)
Using Asymmetric Single-ISA CMPs to Save Energy on Operating Systems
Jayaram Mudigonda
Nathan L. Binkert
Parthasarathy Ranganathan
Vanish Talwar
IEEE Micro, 28(2008), pp. 26-41
Preview abstract
CPUs consume too much power. Modern complex cores sometimes waste power on functions that are not useful for the code they run. In particular, operating system kernels do not benefit from many power-consuming features intended to improve application performance. We advocate asymmetric single-ISA multicore systems, in which some cores are optimized to run OS code at greatly improved energy efficiency.
View details
Auditing to Keep Online Storage Services Honest
Preview abstract
A growing number of online service providers offer to
store customers' photos, email, file system backups, and
other digital assets. Currently, customers cannot make
informed decisions about the risk of losing data stored
with any particular service provider, reducing their incentive to rely on these services. We argue that third-
party auditing is important in creating an online service-
oriented economy, because it allows customers to evaluate risks, and it increases the efciency of insurance-
based risk mitigation. We describe approaches and system hooks that support both internal and external auditing of online storage services, describe motivations for
service providers and auditors to adopt these approaches,
and list challenges that need to be resolved for such auditing to become a reality.
View details
Pip: Detecting the Unexpected in Distributed Systems
Patrick Reynolds
Janet L. Wiener
Mehul A. Shah
Amin Vahdat
NSDI(2006)
WAP5: black-box performance debugging for wide-area systems
Patrick Reynolds
Janet L. Wiener
Marcos Kawazoe Aguilera
Amin Vahdat
WWW(2006), pp. 347-356
Emergent (mis)behavior vs. complex software systems
EuroSys(2006), pp. 293-304
SC2D: an alternative to trace anonymization
Operating Systems Should Support Business Change
HotOS(2005)
Predicting Short-Transfer Latency from TCP Arcana: A Trace-based Validation
Martin F. Arlitt
Balachander Krishnamurthy
Internet Measurment Conference(2005), pp. 213-226
{HTTP Header Field Registrations} (RFC4229)
{Remote Direct Memory Access (RDMA) over IP Problem Statement} (RFC4297)
Clarifying the fundamentals of HTTP
Softw., Pract. Exper., 34(2004), pp. 103-134
2 P2P or Not 2 P2P?
Mema Roussopoulos
Mary Baker
David S. H. Rosenthal
Thomas J. Giuli
IPTPS(2004), pp. 33-43
2 P2P or Not 2 P2P?
Mema Roussopoulos
Mary Baker
David S. H. Rosenthal
Thomas J. Giuli
IPTPS(2004), pp. 33-43
{Registration Procedures for Message Header Fields} (RFC3864)
Unveiling the transport
Lawrence S. Brakmo
David E. Lowell
Dinesh Subhraveti
Justin Moore
Computer Communication Review, 34(2004), pp. 99-106
Design, Implementation, and Evaluation of Duplicate Transfer Detection in HTTP
Utilification
TCP Offload Is a Dumb Idea Whose Time Has Come
HotOS(2003), pp. 25-30
2 P2P or Not 2 P2P?
Mema Roussopoulos
Mary Baker
David S. H. Rosenthal
Thomas J. Giuli
CoRR, cs.NI/0311017(2003)
Architecture and performance of server-directed transcoding
Björn Knutsson
Honghui Lu
Bryan Hopkins
ACM Trans. Internet Techn., 3(2003), pp. 392-424
2 P2P or Not 2 P2P?
Mema Roussopoulos
Mary Baker
David S. H. Rosenthal
Thomas J. Giuli
CoRR, cs.NI/0311017(2003)
Workshop on network-I/O convergence: experience, lessons, implications (NICELI)
Vinay Aggarwal
Olaf Maennel
Allyn Romanow
Computer Communication Review, 33(2003), pp. 75-80
Performance debugging for distributed systems of black boxes
Marcos Kawazoe Aguilera
Janet L. Wiener
Patrick Reynolds
Athicha Muthitacharoen
SOSP(2003), pp. 74-89
{The VCDIFF Generic Differencing and Compression Data Format} (RFC3284)
Clarifying the fundamentals of HTTP
WWW(2002), pp. 25-36
Aliasing on the world wide web: prevalence and performance implications
{Delta encoding in HTTP} (RFC3229)
{Instance Digests in HTTP} (RFC3230)
Server-directed transcoding
Computer Communications, 24(2001), pp. 155-162
Toward a Rigorous Data Type Model for HTTP
HotOS(2001), pp. 176
Rethinking the TCP Nagle algorithm
{Pulse-Per-Second API for UNIX-like Operating Systems, Version 1.0} (RFC2783)
Application performance pitfalls and TCP's Nagle algorithm
Greg Minshall
Yasushi Saito
Ben Verghese
SIGMETRICS Performance Evaluation Review, 27(2000), pp. 36-44
Resource Containers: A New Facility for Resource Management in Server Systems
Key Differences Between HTTP/1.0 and HTTP/1.1
Balachander Krishnamurthy
David M. Kristol
Computer Networks, 31(1999), pp. 1737-1751
Y10K and Beyond (RFC 2550)
Preview abstract
As we approach the end of the millennium, much attention has been
paid to the so-called "Y2K" problem. Nearly everyone now regrets the
short-sightedness of the programmers of yore who wrote programs
designed to fail in the year 2000. Unfortunately, the current fixes
for Y2K lead inevitably to a crisis in the year 10,000 when the
programs are again designed to fail.
This specification provides a solution to the "Y10K" problem which
has also been called the "YAK" problem (hex) and the "YXK" problem
(Roman numerals).
View details
Brittle Metrics in Operating Systems Research
Workshop on Hot Topics in Operating Systems(1999), pp. 90-95
{Hypertext Transfer Protocol -- HTTP/1.1} (RFC2616)
A Scalable and Explicit Event Delivery Mechanism for UNIX
Gaurav Banga
Peter Druschel
USENIX Annual Technical Conference, General Track(1999), pp. 253-265
Errata for 'Potential benefits of delta encoding and data compression for HTTP'
Fred Douglis
Anja Feldmann
Balachander Krishnamurthy
Computer Communication Review, 28(1998), pp. 51-55
Better operating system features for faster network servers
Gaurav Banga
Peter Druschel
SIGMETRICS Performance Evaluation Review, 26(1998), pp. 23-30
Scalable kernel performance for Internet servers under realistic loads
Preview abstract
UNIX Internet servers with an event-driven architecture often perform poorly under real workloads, even if
they perform well under laboratory benchmarking conditions. We investigated the poor performance of eventdriven servers. We found that the delays typical in widearea networks cause busy servers to manage a large number of simultaneous connections. We also observed that
the selectsystem call implementation in most UNIX kernels scales poorly with the number of connections being
managed by a process. The UNIX algorithm for allocating file descriptors also scales poorly. These algorithmic
problems lead directly to the poor performance of eventdriven servers.
We implemented scalable versions of the select system call and the descriptor allocation algorithm. This led
to an improvement of up to 58% in Web proxy and Web
server throughput, and dramatically improved the scalability of the system.
View details
Rate of Change and other Metrics: a Live Study of the World Wide Web
Fred Douglis
Anja Feldmann
Balachander Krishnamurthy
USENIX Symposium on Internet Technologies and Systems(1997)
{Simple Hit-Metering and Usage-Limiting for HTTP} (RFC2227)
Eliminating Receive Livelock in an Interrupt-Driven Kemel
{Use and Interpretation of HTTP Version Numbers} (RFC2145)
Potential Benefits of Delta Encoding and Data Compression for HTTP
Exploring the Bounds of Web Latency Reduction from Caching and Prefetching
Tom M. Kroeger
Darrell D. E. Long
USENIX Symposium on Internet Technologies and Systems(1997)
{Hypertext Transfer Protocol -- HTTP/1.1} (RFC2068)
Eliminating Receive Livelock in an Interrupt-driven Kernel
{Path MTU Discovery for IP version 6} (RFC1981)
Hinted caching in the Web
ACM SIGOPS European Workshop(1996), pp. 103-108
The Case for Persistent-Connection HTTP
SIGCOMM(1995), pp. 299-313
Performance Implications of Multiple Pointer Sizes
Joel F. Bartlett
Robert N. Mayo
Amitabh Srivastava
USENIX Winter(1995), pp. 187-200
Improving HTTP Latency
Recovery in Spritely NFS
Computing Systems, 7(1994), pp. 201-262
A Better Update Policy
USENIX Summer(1994), pp. 99-111
Big Memories on the Desktop
Workshop on Workstation Operating Systems(1993), pp. 110-115
Observing TCP Dynamics in Real Networks
SIGCOMM(1992), pp. 305-317
Network Locality at the Scale of Processes
ACM Trans. Comput. Syst., 10(1992), pp. 81-109
The Effect of Context Switches on Cache Performance
Network Locality at the Scale of Processes
SIGCOMM(1991), pp. 273-284
Efficient Use of Workstations for Passive Monitoring of Local Area Networks
SIGCOMM(1990), pp. 253-263
{Path MTU discovery} (RFC1191)
Spritely NFS: Experiments with Cache-Consistency Protocols
{IP MTU discovery options} (RFC1063)
Measured capacity of an Ethernet: myths and reality
Fragmentation considered harmful
The Packet Filter: An Efficient Mechanism for User-level Network Code
{Internet Standard Subnetting Procedure} (RFC950)
{Internet subnets} (RFC917)
IETF(1984)
{Broadcasting Internet datagrams in the presence of subnets} (RFC922)
IETF(1984)
{Broadcasting Internet Datagrams} (RFC919)
IETF(1984)
Representing Information About Files
ICDCS(1984), pp. 432-439
{A Reverse Address Resolution Protocol} (RFC903)