Change Management in Physical Network Lifecycle Automation

Mo Alfares

Virginia Beauregard

Kevin Grant

Angus Griffith

Jahangir Hasan

Chen Huang

Quan Leng

Jiayao Li

Alexander Lin

Zhoutao Liu

Ahmed Mansy

Bill Martinusen

Nikil Mehta

Jeffrey C. Mogul

Andrew Narver

Anshul Nigham

Melanie Obenberger

Sean Smith

Kurt Steinkraus

Sheng Sun

Edward Thiele

Amin Vahdat

Proc. 2023 USENIX Annual Technical Conference (USENIX ATC 23)

Download Google Scholar

Abstract

Automated management of a physical network's lifecycle is critical for large networks. At Google, we manage network design, construction, evolution, and management via multiple automated systems. In our experience, one of the primary challenges is to reliably and efficiently manage change in this domain -- additions of new hardware and connectivity, planning and sequencing of topology mutations, introduction of new architectures, new software systems and fixes to old ones, etc. We especially have learned the importance of supporting multiple kinds of change in parallel without conflicts or mistakes (which cause outages) while also maintaining parallelism between different teams and between different processes. We now know that this requires automated support. This paper describes some of our network lifecycle goals, the automation we have developed to meet those goals, and the change-management challenges we encountered. We then discuss in detail our approaches to several specific kinds of change management: (1) managing conflicts between multiple operations on the same network; (2) managing conflicts between operations spanning the boundaries between networks; (3) managing representational changes in the models that drive our automated systems. These approaches combine both novel software systems and software-engineering practices. While this paper reports on our experience with large-scale datacenter network infrastructures, we are also applying the same tools and practices in several adjacent domains, such as the management of WAN systems, of machines, and of datacenter physical designs. Our approaches are likely to be useful at smaller scales, too.

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Change Management in Physical Network Lifecycle Automation

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Change Management in Physical Network Lifecycle Automation

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities