Charisma Chan
Research Areas
Authored Publications
Google Publications
Other Publications
Sort By
Preview abstract
Google has written the book on Site Reliability Engineering best practices, but how teams actually respond to production incidents often differs from the ideal practices we put on paper.
This article will cover the reality of debugging issues in production at Google, including the types of tools, high-level strategies, and low-level tasks that engineers use in varying combinations to effectively debug. We will: 1) detail the research approach taken to capture this data and surface patterns of behavior, 2) share findings on the common engineering pathways, processes, and attitudes in this space, and 3) share examples of how experts have debugged complex distributed systems, highlighting where best practices were followed or broken.
View details
No Results Found