Product Reliability for Google Maps

Joe Abrams; Micah Lerner

Product Reliability for Google Maps

Joe Abrams

Micah Lerner

(2024) (to appear)

Download Google Scholar

Abstract

As our organization has gotten very good at protecting server SLOs with reliability best practices like scaling globally distributed at-scale architectures, toil mitigation, and continuous reliability improvements we noticed that a majority of incidents impacting our end-users were not showing up as an SLO miss.

In many cases these outages were not even observable from the server side - for example, the rollout of a new version of the consumer mobile application (that our services powers) to an app store could break one or more critical feature(s) due to bugs in client code. This reality has led to a change in the way we approach reliability - we’re shifting our focus from server reliability to product reliability.

We’re not yet finished with the transition, but we’re starting to see very positive results. Our talk shares challenges we've solved so far, lessons we've learned, and our vision for the future.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Product Reliability for Google Maps

Abstract

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs