David Challoner

David Challoner

These days I primarily lead the team which runs the systems that implement different kinds of data access controls across Google. These are used by external products like Assured Workloads, Access Transparency, or Customer Approvals. I also work on furthering Google's adoption of its own cloud technologies in my work with our Enterprise Cloud Solutions (ECS) workgroup specifically focusing on networking and Google's internal adoption of Anthos Service Mesh (ASM).

Previously I've worked on services that power our Endpoint Verification and Context Aware Access pipelines which are used both internally at Google and by Cloud Customers to implement Zero Trust (or Beyond Corp) IT security and before that, internal storage (nfs/smb, distributed software defined storage)

External Publications:
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    The Site Reliability Engineering Workbook Chapter: Eliminating Toil
    Chris Schrier
    David Huska
    James O'Keeffe
    Joanna L. Wijntjes
    Matt Sartwell
    Vivek Rau
    The Site Reliability Engineering Workbook: Practical Ways to Implement SRE(2018)
    Preview abstract Google SREs spend much of their time optimizing—squeezing every bit of performance from a system through project work and developer collaboration. But the scope of optimization isn’t limited to compute resources: it’s also important that SREs optimize how they spend their time. Primarily, we want to avoid performing tasks classified as toil. For a comprehensive discussion of toil, see Chapter 5 in Site Reliability Engineering. For the purposes of this chapter, we’ll define toil as the repetitive, predictable, constant stream of tasks related to maintaining a service. Toil is seemingly unavoidable for any team that manages a production service. System maintenance inevitably demands a certain amount of rollouts, upgrades, restarts, alert triaging, and so forth. These activities can quickly consume a team if left unchecked and unaccounted for. Google limits the time SRE teams spend on operational work (including both toil- and non-toil-intensive work) at 50% (for more context on why, see Chapter 5 in our first book). While this target may not be appropriate for your organization, there’s still an advantage to placing an upper bound on toil, as identifying and quantifying toil is the first step toward optimizing your team’s time. View details
    No Results Found