We design and build the world’s largest, fastest, most reliable data-center and WAN networks, to enable compute and storage not available anywhere else.
About the team
Our team brings together experts in networking, distributed systems, kernel and systems programming, and algorithms to create the networks that power Google. Our networks are among the world’s largest and fastest, and we design them to be reliable, cheap, and easy to evolve. We often use new technologies unavailable outside Google.
We exemplify Google’s Hybrid Approach to Research: we deploy real-world systems at global scale. Many members of our team have extensive research experience, we publish papers in conferences such as SIGCOMM, NSDI, SOSP, and OSDI, and we work closely with interns and faculty from leading universities.
Every Google product relies on the technologies we develop. Our networks support complex, highly-available, planetary-scale distributed systems with billions of users. We constantly evolve our networks to meet the requirements of, and create opportunities for, new and better Google products, especially the rapidly-growing Google Cloud.
Congestion control, network measurement, and traffic management
All networks are subject to congestion; we want to operate ours at high utilization levels (to reduce costs) while meeting strict performance objectives. We’re inventing new congestion avoidance protocols, and scaling up our global-scale, near-real-time, automated traffic engineering system. We’re building new techniques to measure our networks, accurately and at scale, to drive our evaluation of congestion-control techniques, and as real-time input to automated traffic management.
Data-center network design
We continue to innovate in designs for scalable, fast, cheap, reliable, and evolvable data-center networks. When necessary, we design our own hardware, and innovate in network topology and routing protocols. We’re exploring automatic techniques to optimize network designs.
We’re working towards increasingly automated network management systems, enabling us to rapidly repair and modify our networks with little or no downtime. We’re using techniques such as formal modeling of network topologies and highly-available distributed systems, while working closely with Google’s network operators to implement automated workflows.
Programmable packet processing
To match the continuing increases in storage and networking hardware speed, we are developing new communication APIs and mechanisms for low-latency and CPU-efficient communication. We want our network switches and endpoints to implement novel packet-processing functions without compromising on cost or performance -- functions such as load balancing, virtualization, access control, reliable transport, and packet-level event monitoring. We’re exploring a variety of hardware and software techniques for fast, flexible, safe packet processing, including onload, offload, RDMA, P4, and more.
We use software-defined networking extensively in both data-center networks and WANs. We collaborated on and popularized early work on OpenFlow, and continue to raise the level of abstraction for silicon-agnostic switching. We are developing SDN controller platforms that can handle Google’s needs for scale and reliability, and a set of SDN applications for routing, traffic management, and other functions.
High velocity development and testing
To introduce our network innovations into production as rapidly as possible, without compromising availability, we test our designs and implementations early, often, and extensively. We are developing advanced software validation techniques, we embrace automation in all aspects of testing and qualification, and we build powerful infrastructure for testing, debugging, and root-causing, in both physical and emulated testbeds.
We’ve developed one of the world’s largest, most cost-effective wide area networks, and we continue to find ways to increase its scale and reliability, while extracting the best possible performance from expensive WAN hardware and fiber links. We’re employing Google-designed hardware, SDN controllers, and global-scale automated traffic engineering to address these challenges.
Some of our people
At Google, I enjoy the best balance in combining cutting-edge research with impact at the largest scale. Scaling one of the world's largest networks requires answering fundamental research questions combined with world-class engineering to put it all into practice.
Join our team
PhD-level software engineers in Network Infrastructure apply their research training to the toughest problems of designing and building large-scale, high-performance, high-availability distributed systems to design, manage, measure, and control our datacenter, WAN, and peering-edge SDN networks (each of which has been the subject of at least one SIGCOMM paper). We're also creating innovative end-host stacks, to support CPU-efficient, low-latency, congestion-aware communication, with secure isolation between users. You'll work with other skillful, creative people, including people who wrote research papers you've read, and you'll keep connected with the academic research community.
Networking Software Engineers in Network Infrastructure work on Google's data center hardware and software infrastructure, providing research, design, building, testing, and support services to enhance our networking software solutions. In this role, you'll work to deliver Google's next generation networks and help solve the hardest problems in scale and availability.
In this role you’ll use your technical expertise in distributed computing and large-scale systems to design, develop, test, deploy, maintain, and enhance software solutions as well as build software for distributed services, abstractions, and the components of the system that operate and power the world's largest network infrastructure.
Network Test Engineers in Network Infrastructure assess the quality of Network Infrastructure’s products via early manual testing, analyzing regression results, shaping test plans, and building testbeds. NTEs are experts in how products operate, and finding & diagnosing any flaws or gaps. You shepherd products through validation tests, and into production; to ensure testbeds, coverage, and release pipelines are ready for each stage, including keeping quality and release cycles efficient and scalable throughout the lifecycle of products. When needed, you write code to achieve goals.