Tim A. D. Henderson
Dr. Tim A. D. Henderson is a researcher and software engineer at Google. At Google he works on test automation, continuous integration, test case selection, test case prioritization, and fault localization. Tim received a PhD in Computer Science from Case Western Reserve University in Cleveland, Ohio in 2017. Other research interests include data mining, graph mining, software engineering, software testing, code clones, fault localization and program analysis. Publications, articles, and additional information can be found on his website: https://hackthology.com/
Research Areas
Authored Publications
Sort By
Flake Aware Culprit Finding
Eric Nickell
Collin Johnston
Avi Kondareddy
Proceedings of the 16th IEEE International Conference on Software Testing, Verification and Validation (ICST 2023), IEEE (to appear)
Preview abstract
When a change introduces a bug into a large software repository, there is
often a delay between when the change is committed and when bug is detected.
This is true even when the bug causes an existing test to fail! These delays
are caused by resource constraints which prevent the organization from running
all of the tests on every change. Due to the delay, a Continuous Integration
system needs to locate buggy commits. Locating them is complicated by flaky
tests that pass and fail non-deterministically. The flaky tests introduce
noise into the CI system requiring costly reruns to determine if a failure was
caused by a bad code change or caused by non-deterministic test behavior. This
paper presents an algorithm, Flake Aware Culprit Finding, that locates
buggy commits more accurately than a traditional bisection search. The
algorithm is based on Bayesian inference and noisy binary search, utilizing
prior information about which changes are most likely to contain the bug. A
large scale empirical study was conducted at Google on 13,000+ test breakages.
The study evaluates the accuracy and cost of the new algorithm versus a
traditional deflaked bisection search.
View details
Preview abstract
In an ideal Continuous Integration (CI) workflow, all potentially impacted builds/tests would be run before submission for every proposed change. In large-scale environments like Google’s mono-repository, this is not feasible to do in terms of both latency and compute cost, given the frequency of change requests and overall size of the codebase. Instead, the compromise is to run more comprehensive testing at later stages of the development lifecycle – batched after submission. These runs detect newly broken tests. Automated culprit finders determine which change broke the tests, the culprit change.
In order to make testing at Google more efficient and lower latency, TAP Postsubmit is developing a new scheduling algorithm that utilizes Bug Prediction metrics, features from the change, and historical information about the tests and targets to predict and rank targets by their likelihood to fail. This presentation examines the association between some of our selected features with culprits in Google3.
View details
Improving Fault Localization by Integrating Valueand Predicate Based Causal Inference Techniques
Yigit Kucuk
Andy Podgurski
Proceedings of the International Conference on Software Engineering, IEEE/ACM (2021) (to appear)
Preview abstract
Statistical fault localization (SFL) techniques use execution profiles and success/failure information from software executions, in conjunction with statistical inference, to automatically score program elements based on how likely they are to be faulty. SFL techniques typically employ one type of profile data: either coverage data, predicate outcomes, or variable values. Most SFL techniques actually measure correlation, not causation, between profile values and success/failure, and so they are subject to confounding bias that distorts the scores they produce. This paper presents a new SFL technique, named UniVal, that uses causal inference techniques and machine learning to integrate information about both predicate outcomes and variable values to more accurately estimate the true failure-causing effect of program statements. UniVal was empirically compared to several coverage-based, predicate-based, and value-based SFL techniques on 800 program versions with real faults.
View details
Preview abstract
Flaky, non-deterministic tests make culprit finding (or finding the version of code where a test started failing) in large scale repositories difficult. A naive binary search (or bisect) algorithm will be unreliable if the test being used for the bisection is flaky. If retries are conducted for each tested version the process becomes expensive. We propose a flake-aware culprit finding system which takes into account the prior flakiness of a test and uses a Bayesian probabilistic model to reduce the number of test executions needed to achieve accurate culprit finding faster and using fewer resources than binary search with retries.
View details
Trends in Very Large Scale Continuous Integration
Google (to appear)
Preview abstract
Trends in Very Large Scale Continuous Integration
1. What is CI
2. What is meant by "very large scale"
3. Goals in large scale continuous integration (which often compete with each other)
4. Challenges to meeting those goals
5. Categories of solutions
6. Trends
View details