Jump to Content
Rui Wang

Rui Wang

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, desc
  • Year
  • Year, desc
    Preview abstract Three prominent traffic features including peak alignment, stable ranking, and gravity model, have guided the design of current Google Jupiter fabrics in traffic engineering, topology engineering, and capacity planning. View details
    Preview abstract Traffic load balancing across multiple paths is a critical task for modern networks to reduce network congestion and improve network efficiency. Hashing which is the foundation of traffic load balancing still faces practical challenges. The key problem is there is a growing need for more hash functions because networks are getting larger with more switches, more stages and increased path diversity. Meanwhile topology and routing becomes more agile in order to efficiently serve traffic demands with stricter throughput and latency SLAs. On the other hand, current generation switch chips only provide a limited number of uncorrelated hash functions. We first demonstrate why the limited number of hashing functions is a practical challenge in today's datacenter network (DCN) and wide-area network (WAN) designs. Then, to mitigate the problem, we propose a novel approach named \textsl{color recombining} which enables hash functions reuse via leveraging topology traits of multi-stage DCN networks. We also describe a novel framework based on \textsl{\coprime} theory to mitigate hash correlation in generic mesh topologies (i.e., spineless DCN and WAN). Our evaluation on real network trace data and topologies demonstrate that we can reduce the extent of load imbalance (measured by coefficient of variation) by an order of magnitude. View details
    Preview abstract To reduce cost, datacenter network operators are exploring blocking network designs. An example of such a design is a "spine-free" form of a Fat-Tree, in which pods directly connect to each other, rather than via spine blocks. To maintain application-perceived performance in the face of dynamic workloads, these new designs must be able to reconfigure routing and the inter-pod topology. Gemini is a system designed to achieve these goals on commodity hardware while reconfiguring the network infrequently, rendering these blocking designs practical enough for deployment in the near future. The key to Gemini is the joint optimization of topology and routing, using as input a robust estimation of future traffic derived from multiple historical traffic matrices. Gemini “hedges” against unpredicted bursts, by spreading these bursts across multiple paths, to minimize packet loss in exchange for a small increase in path lengths. It incorporates a robust decision algorithm to determine when to reconfigure, and whether to use hedging. Data from tens of production fabrics allows us to categorize these as either low- or high-volatility; these categories seem stable. For the former, Gemini finds topologies and routing with near-optimal performance and cost. For the latter, Gemini’s use of multi-traffic-matrix optimization and hedging avoids the need for frequent topology reconfiguration, with only marginal increases in path length. As a result, Gemini can support existing workloads on these production fabrics using a spine-free topology that is half the cost of the existing topology on these fabrics. View details
    Minimal Rewiring: Efficient Live Expansion for Clos Data Center Networks
    Shizhen Zhao
    Joon Ong
    Proc. 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2019), USENIX Association (to appear)
    Preview abstract Clos topologies have been widely adopted for large-scale data center networks (DCNs), but it has been difficult to support incremental expansions of Clos DCNs. Some prior work has assumed that it is impossible to design DCN topologies that are both well-structured (non-random) and incrementally expandable at arbitrary granularities. We demonstrate that it is indeed possible to design such networks, and to expand them while they are carrying live traffic, without incurring packet loss. We use a layer of patch panels between blocks of switches in a Clos network, which makes physical rewiring feasible, and we describe how to use integer linear programming (ILP) to minimize the number of patch-panel connections that must be changed, which makes expansions faster and cheaper. We also describe a block-aggregation technique that makes our ILP approach scalable. We tested our "minimal-rewiring" solver on two kinds of fine-grained expansions using 2250 synthetic DCN topologies, and found that the solver can handle 99% of these cases while changing under 25% of the connections. Compared to prior approaches, this solver (on average) reduces the number of "stages" per expansion by about 3.1X -- a significant improvement to our operational costs, and to our exposure (during expansions) to capacity-reducing faults. View details
    No Results Found