Jump to Content

Charisma Chan

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Google has written the book on Site Reliability Engineering best practices, but how teams actually respond to production incidents often differs from the ideal practices we put on paper. This article will cover the reality of debugging issues in production at Google, including the types of tools, high-level strategies, and low-level tasks that engineers use in varying combinations to effectively debug. We will: 1) detail the research approach taken to capture this data and surface patterns of behavior, 2) share findings on the common engineering pathways, processes, and attitudes in this space, and 3) share examples of how experts have debugged complex distributed systems, highlighting where best practices were followed or broken. View details
    No Results Found