Designing and Operating Highly Available Software Systems at Scale

Michael Wildpaner
Escuela Politécnica de Ingeniería de Gijón, Gijón (2019)

Abstract

The talk explains what Site Reliability Engineering (SRE) is, how it is used at Google, and gives an overview of the challenges to take a regular LAMP-style small service into supporting 100M users, it also speaks about monitoring and other SRE dimensions, from capacity planning to design reviews.