“Canarying” is a colloquial term originating from bringing a caged canary into a mine to find dangerous gases. John Scott Haldane proposed the idea around 1913.
In this article, canarying is a partial and time-limited deployment of a change in a service, followed by an evaluation of whether the service change is safe. The production change process may then roll forward, roll back, alert a human, or do something else. Effective canarying involves many decisions—for example, how to deploy the partial service change or choose meaningful metrics—and deserves a separate discussion.
Canary Analysis Service (CAS) is a shared centralized service at Google that offers automatic (and often auto-configured) analysis of key metrics during a production change. We use CAS to analyze new versions of binaries, configuration changes, data set changes, and other production changes. CAS evaluates hundreds of thousands of production changes per day.