Avi Kondareddy

Avi Kondareddy

Avi works on Google's main CI system TAP on regression test selection and other CI-related software engineering problems.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Taming the Variants Multi-Architecture Continuous Testing at Google
    Sushmita Azad
    Chandrakanth Chittappa
    Ali Esmaeeli
    Laura Macaddino
    Sam Manfreda
    David Margolin
    Dharma Naidu
    Sabuj Pattanayek
    Sachin Sable
    Ruslan Sakevych
    Dushyant Acharya
    Adrian Berding
    Kevin Crossan
    Wolff Dobson
    Abhay Singh
    19th IEEE International Conference on Software Testing, Verification and Validation (ICST) 2026, Daejeon, Republic of Korea, IEEE
    Preview abstract Enterprises are increasingly adopting multiple general-purpose computer architectures in the data center. This leads to new testing challenges as it creates demand to qualify the software for the additional architectures. Naively double-testing all software for both architectures is costly and unnecessary. Further, reconfiguring CI/CD to take advantage of the new architecture can be non-trivial at scale. This paper introduces CI/CD variants and an optimized testing cycle to solve these twin challenges. We empirically evaluate our solution's impact on human and machine expenses using 44k projects at Google on real production data. First, we estimate saving ~25% of machine expenses at the negligible cost of a few delayed breakage detections per day. Second, we estimate a 90+% reduction in human cost for migrating the configuration. All features described in this paper are now Generally Available at Google and we report this as an empirical case study in scaling CI/CD to new architectures. View details
    SafeRevert: When Can Breaking Changes be Automatically Reverted?
    Sushmita Azad
    Eric Nickell
    2024 IEEE Conference on Software Testing, Verification and Validation (ICST), IEEE, Toronto, ON, Canada
    Preview abstract When bugs or defects are introduced into a large scale software repository, they reduce productivity. Programmers working on related areas of the code will encounter test failures, compile breakages, or other anomalous behavior. On encounter- ing these issues, they will need to troubleshoot and determine that their changes were not the cause of the error and that another change is a fault. They must then find that change and revert it to return the repository to a healthy state. In the past, our group has identified ways to identify the root cause (or culprit) change that introduced a test failure even when the test is flaky. This paper focuses on a related issue: At what point does the Continuous Integration system have enough evidence to support automatically reverting a change? We will motivate the problem, provide several methods to address it, and empirically evaluate our solution on a large set (34,000) real world breaking changes that occurred at Google. View details
    Preview abstract In an ideal Continuous Integration (CI) workflow, all potentially impacted builds/tests would be run before submission for every proposed change. In large-scale environments like Google’s mono-repository, this is not feasible to do in terms of both latency and compute cost, given the frequency of change requests and overall size of the codebase. Instead, the compromise is to run more comprehensive testing at later stages of the development lifecycle – batched after submission. These runs detect newly broken tests. Automated culprit finders determine which change broke the tests, the culprit change. In order to make testing at Google more efficient and lower latency, TAP Postsubmit is developing a new scheduling algorithm that utilizes Bug Prediction metrics, features from the change, and historical information about the tests and targets to predict and rank targets by their likelihood to fail. This presentation examines the association between some of our selected features with culprits in Google3. View details
    Flake Aware Culprit Finding
    Eric Nickell
    Collin Johnston
    Proceedings of the 16th IEEE International Conference on Software Testing, Verification and Validation (ICST 2023), IEEE (to appear)
    Preview abstract When a change introduces a bug into a large software repository, there is often a delay between when the change is committed and when bug is detected. This is true even when the bug causes an existing test to fail! These delays are caused by resource constraints which prevent the organization from running all of the tests on every change. Due to the delay, a Continuous Integration system needs to locate buggy commits. Locating them is complicated by flaky tests that pass and fail non-deterministically. The flaky tests introduce noise into the CI system requiring costly reruns to determine if a failure was caused by a bad code change or caused by non-deterministic test behavior. This paper presents an algorithm, Flake Aware Culprit Finding, that locates buggy commits more accurately than a traditional bisection search. The algorithm is based on Bayesian inference and noisy binary search, utilizing prior information about which changes are most likely to contain the bug. A large scale empirical study was conducted at Google on 13,000+ test breakages. The study evaluates the accuracy and cost of the new algorithm versus a traditional deflaked bisection search. View details
    ×