Jump to Content
Mohammad Al-Fares

Mohammad Al-Fares

Mohammad (Mo) Al-Fares is with the Global Networking team at Google where he works on software systems to support network design, management, and cloud networking in Google’s production networks. Prior to joining Google, he earned his Ph.D. in Computer Science at UC San Diego, advised by Prof. Amin Vahdat.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Change Management in Physical Network Lifecycle Automation
    Virginia Beauregard
    Kevin Grant
    Angus Griffith
    Jahangir Hasan
    Chen Huang
    Quan Leng
    Jiayao Li
    Alexander Lin
    Zhoutao Liu
    Ahmed Mansy
    Bill Martinusen
    Nikil Mehta
    Andrew Narver
    Anshul Nigham
    Melanie Obenberger
    Sean Smith
    Kurt Steinkraus
    Sheng Sun
    Edward Thiele
    Proc. 2023 USENIX Annual Technical Conference (USENIX ATC 23)
    Preview abstract Automated management of a physical network's lifecycle is critical for large networks. At Google, we manage network design, construction, evolution, and management via multiple automated systems. In our experience, one of the primary challenges is to reliably and efficiently manage change in this domain -- additions of new hardware and connectivity, planning and sequencing of topology mutations, introduction of new architectures, new software systems and fixes to old ones, etc. We especially have learned the importance of supporting multiple kinds of change in parallel without conflicts or mistakes (which cause outages) while also maintaining parallelism between different teams and between different processes. We now know that this requires automated support. This paper describes some of our network lifecycle goals, the automation we have developed to meet those goals, and the change-management challenges we encountered. We then discuss in detail our approaches to several specific kinds of change management: (1) managing conflicts between multiple operations on the same network; (2) managing conflicts between operations spanning the boundaries between networks; (3) managing representational changes in the models that drive our automated systems. These approaches combine both novel software systems and software-engineering practices. While this paper reports on our experience with large-scale datacenter network infrastructures, we are also applying the same tools and practices in several adjacent domains, such as the management of WAN systems, of machines, and of datacenter physical designs. Our approaches are likely to be useful at smaller scales, too. View details
    B4 and After: Managing Hierarchy, Partitioning, and Asymmetry for Availability and Scale in Google's Software-Defined WAN
    Min Zhu
    Rich Alimi
    Kondapa Naidu Bollineni
    Chandan Bhagat
    Sourabh Jain
    Jay Kaimal
    Jeffrey Liang
    Kirill Mendelev
    Faro Thomas Rabe
    Saikat Ray
    Malveeka Tewari
    Monika Zahn
    Joon Ong
    SIGCOMM'18 (2018)
    Preview abstract Private WANs are increasingly important to the operation of enterprises, telecoms, and cloud providers. For example, B4, Google’s private software-defined WAN, is larger and growing faster than our connectivity to the public Internet. In this paper, we present the five-year evolution of B4. We describe the techniques we employed to incrementally move from offering best-effort content-copy services to carrier-grade availability, while concurrently scaling B4 to accommodate 100x more traffic. Our key challenge is balancing the tension introduced by hierarchy required for scalability, the partitioning required for availability, and the capacity asymmetry inherent to the construction and operation of any large-scale network. We discuss our approach to managing this tension: i) we design a custom hierarchical network topology for both horizontal and vertical software scaling, ii) we manage inherent capacity asymmetry in hierarchical topologies using a novel traffic engineering algorithm without packet encapsulation, and iii) we re-architect switch forwarding rules via two-stage matching/hashing to deal with asymmetric network failures at scale. View details
    No Results Found