A Regression Paradox for Linear Models: Sufficient Conditions and Relation to Simpson's Paradox

Thomas Bengtsson
Tin Kam Ho
The American Statistician, 63(2009), pp. 218-225

Abstract

An analysis of customer survey data using direct and reverse linear regression leads to inconsistent conclusions with respect to the effect of a group variable. This counterintuitive phenomenon, called the regression paradox, causes seemingly contradictory group effects when the predictor and regressand are interchanged. Using analytical developments as well as geometric arguments, we describe sufficient conditions under which the regression paradox will appear in linear Gaussian models. The results show that the phenomenon depends on a distribution shift between the groups relative to the predictability of the model. As a consequence, the paradox can appear naturally in certain distributions, and may not be caused by sampling error or incorrectly specified models. Simulations verify that the paradox may appear in more general, non-Gaussian settings. An interesting, geometric connection to Simpsons paradox is provided.

Research Areas