Flake-Aware Culprit Finding

Collin Johnston
Eric Nickell
CI/CD Industry Workshop (2021)

Abstract

Flaky, non-deterministic tests make culprit finding (or finding the version of code where a test started failing) in large scale repositories difficult. A naive binary search (or bisect) algorithm will be unreliable if the test being used for the bisection is flaky. If retries are conducted for each tested version the process becomes expensive. We propose a flake-aware culprit finding system which takes into account the prior flakiness of a test and uses a Bayesian probabilistic model to reduce the number of test executions needed to achieve accurate culprit finding faster and using fewer resources than binary search with retries.

Research Areas