Problems of analyzing complex sample design data with common statistical software packages

Tzuyun Chin
2002 annual meeting of American Educational Research Association(2003)

Abstract

The technical aspects of various complex sample designs and their impacts on parameter estimations and statistical inferences have been well discussed. However, as noted by Brogan (1998), when analyzing complex sample data, many researchers are still not used to apply proper procedures and/or specialized software or even do not recognize the needs of using such procedures or software. To illustrate the pitfalls of ignoring complex sample design when applying standard statistical procedures in common software, Brogan (1998) compared the estimates and their estimated standard errors from SAS to those from SUDAAN. The data Brogan used had a total sample size of 20,049 and was from a RDD sampling survey with unequal selection probabilities, poststratification and nonresponse adjustment. Although RDD design usually exhibits small design effect, she still found substantial discrepancy in (1) estimated prevalence, (2) estimated chi-squire values, and (3) the corresponding standard errors of estimated prevalence and chi-square values. The focus of the current paper was to provide informative and not highly technical discussions and recommendations for common educational researchers to help them better utilize data from complex sample designs. We provide examples with educational data to illustrate the effect of ignoring complex sampling design. These illustrations expanded Brogan (1998) in the following aspects: (1) continuous variables, which are common in educational research, were used, (2) estimation of means was illustrated, (3) estimation of regression coefficients was illustrated. Again, because regression models are widely used by educational researchers from various fields and with various backgrounds, examples using such models were intended to communicate to broader audiences.

Research Areas