Who Broke the Build? Automatically Identifying Changes That Induce Test Failures In Continuous Integration at Google Scale

Proceedings of the 39th International Conference on Software Engineering: Software Engineering in Practice Track, IEEE Press, Buenos Aires, Argentina(2017), pp. 113-122 (to appear)
Google Scholar


Quickly identifying and fixing code changes that introduce regressions is critical to keep the momentum on software development, especially in very large scale software repositories with rapid development cycles, such as at Google. Identifying and fixing such regressions is one of the most expensive, tedious, and time consuming tasks in the software development life-cycle. Therefore, there is a high demand for automated techniques that can help developers identify such changes while minimizing manual human intervention. Various techniques have recently been proposed to identify such code changes. However, these techniques have shortcomings that make them unsuitable for rapid development cycles as at Google. In this paper, we propose a novel algorithm to identify code changes that introduce regressions, and discuss case studies performed at Google on 140 projects. Based on our case studies, our algorithm automatically identifies the change that introduced the regression in the top-5 among thousands of candidates 82% of the time, and provides considerable savings on manual work developers need to perform

Research Areas