Engineering Reliability into Sites

RAM Huntsville (2016)
Google Scholar

Abstract

This talk introduces Site Reliability Engineering (SRE) at Google,
explaining its purpose and describing the challenges it addresses.
SRE teams manage Google's many services and web sites from our offices in
Pittsburgh, New York, London, Sydney, Zurich, Los Angeles, Dublin, Mountain View, ...
They draw upon the Linux based computing resources
that are distributed in data centers around the world.