Designing and Operating Highly Available Software Systems at Scale
Abstract
The talk explains what Site Reliability Engineering (SRE) is, how it is used at Google, and gives an overview of the challenges to take a regular LAMP-style small service into supporting 100M users, it also speaks about monitoring and other SRE dimensions, from capacity planning to design reviews.