Jump to Content

Improving Fault Localization by Integrating Valueand Predicate Based Causal Inference Techniques

Yigit Kucuk
Andy Podgurski
Proceedings of the International Conference on Software Engineering, IEEE/ACM (2021) (to appear)

Abstract

Statistical fault localization (SFL) techniques use execution profiles and success/failure information from software executions, in conjunction with statistical inference, to automatically score program elements based on how likely they are to be faulty. SFL techniques typically employ one type of profile data: either coverage data, predicate outcomes, or variable values. Most SFL techniques actually measure correlation, not causation, between profile values and success/failure, and so they are subject to confounding bias that distorts the scores they produce. This paper presents a new SFL technique, named UniVal, that uses causal inference techniques and machine learning to integrate information about both predicate outcomes and variable values to more accurately estimate the true failure-causing effect of program statements. UniVal was empirically compared to several coverage-based, predicate-based, and value-based SFL techniques on 800 program versions with real faults.