Global networking
The Global Networking (GN) team at Google is responsible for the design, development, build and operation of the networks that connect our data centers to our customers.
About the team
The Global Networking team is responsible for the design, development, build and operation of Google’s global network that every Google service, including the Google Cloud Platform, runs on. We develop cutting-edge networking technologies that allow Google's global WAN to be zero touch, build some of the largest scale software defined networks (SDNs) infrastructure ever deployed (B4, Espresso), scale Google's global content delivery networks (CDNs) that support Google services, and develop sophisticated software systems for network capacity forecasting, planning and optimization. We leverage the latest advances in AI/ML to drive greater autonomy in network design and operations.
We continuously expand the reach of Google's network across the world by laying new optical fibers and building hundreds of points of presence worldwide. This global footprint allows us to optimize the end-to-end speed and reliability of the traffic that we carry for our users and for Google Cloud customers, delivering optimal performance and best-in-class availability.
In doing all this, we develop and rely on the most advanced techniques in network hardware and software, traffic engineering, and network management to deliver unprecedented scale, availability and performance at industry-leading cost points. Additionally, we are also advancing the state-of-the-art in data analytics and machine learning to drive network efficiency and optimization at scale.
Google has a long history of fundamental research in networking. We have engaged in a collaborative research effort with the National Science Foundation and other industrial partners to launch a $40 million program in academic research for Resilient and Intelligent Next-Generation (NextG) Systems, or RINGS. In addition to funding, Google offers expertise, research collaborations, infrastructure, and in-kind support for researchers and students as they advance knowledge and progress in the field.
Team focus summaries
All networks are subject to congestion; we operate ours at high utilization levels, while meeting strict performance objectives. We’re inventing new congestion avoidance protocols, and improving our global-scale, near-real-time, automated traffic engineering system. We’re building better ways to measure our networks, accurately and at scale, to drive our evaluation of congestion-control techniques, and as real-time input to automated traffic management.
We collect traffic statistics all around our network infrastructure to track performance, quickly detect unusual events, and compute SLA compliance. We rely on the most advanced data science techniques, machine learning in particular, to reduce the time it takes to detect and root cause events. We have designed and deployed techniques that can detect, pinpoint and mitigate network problems within a few minutes without human intervention. We use predictive analytics to anticipate some types of problems and adjust our traffic engineering, or to plan capacity increases.
We’re building automated network management systems, enabling us to rapidly repair and improve our networks with little or no downtime. We’re using techniques such as formal modeling of network topologies and highly-available distributed systems, while working closely with vendors and network operators to implement open software APIs to enable greater levels of automation and programmability across the entire network management lifecycle.
We work on developing and deploying cutting-edge optical solutions to scale cost-effectively and to increase network availability. These include new coherent transmission technologies, disaggregated line systems, high-capacity submarine wet plants, subsea switching technologies, transport SDN configurations, and sophisticated physical and logical layer design and optimization tools.
We are developing new mechanisms for low-latency, CPU-efficient communication. We want our network switches and endpoints to implement novel on-device packet processing functions, without compromising cost or performance. We’re exploring hardware and software techniques for fast, flexible, safe packet processing, including onload, offload, RDMA, P4, and more.
To introduce network innovations into production as rapidly as possible, without compromising availability, we test our designs and implementations early, often, and extensively. We are developing advanced software validation techniques. We embrace automation in all aspects of testing and qualification, and we build powerful infrastructure for testing, debugging, and root-causing, in both physical and emulated testbeds, building a “digital twin” for our network infrastructure.
We employ SDN extensively. We were early users of, and contributors to, OpenFlow, and continue, with the P4 programming language, to raise the level of abstraction for silicon-agnostic switching. We are developing SDN controller platforms that can handle Google’s needs for scale and reliability, and SDN applications for routing, traffic management, and other functions. We recently published a paper on our experiments with decentralizing SDN architecture for increased scalability and faster convergence times.
We’ve developed one of the world’s largest, most cost-effective wide area networks, and we continue to increase its scale and reliability, while extracting the best possible performance from WAN hardware and fiber links. We’re employing Google-designed and vendor-supplied hardware, SDN controllers, and global-scale automated traffic engineering to address these challenges.
Google is rapidly expanding its global network to meet the surging and unpredictable demands of its products, cloud customers, and groundbreaking AI/ML applications. To support this growth, AIOps is pioneering transformative approaches to build and manage networks powered by best-in-class AI algorithms and GenAI models. Specifically, we are leveraging Gemini to develop intelligent agents that enhance network autonomy. Additionally, we are training custom machine learning models on Google's internal network data to drive robust forecasting, failure predictions and prevention, and precise root cause analysis and mitigation.
Featured publications
Some of our people
-
Christophe Diot
- Data Mining and Modeling
- Networking
-
Dennis Fetterly
- Distributed Systems and Parallel Computing
-
Drago Goricanec
- Networking
-
Bikash Koley
- Distributed Systems and Parallel Computing
- Networking
-
Subhasree Mandal
- Networking
-
Steve Padgett
- Hardware and Architecture
- Networking
-
Pavlos Papageorge
- Networking
-
Anees Shaikh
- Distributed Systems and Parallel Computing
- Networking
- Software Systems
-
Rob Shakir
- Networking
-
Mattia Cantono
- Networking
-
Phillipa Gill
- Networking
- Security, Privacy and Abuse Prevention
-
Priya Mahadevan