A Comparison of Questionnaire Biases Across Sample Providers
Abstract
Survey research, like all methods, is fraught with potential sources of error that can significantly affect the validity and reliability of results. There are four major types of error common to surveys as a data collection method: (1) coverage error arising from certain segments of a target population being excluded, (2) nonresponse error where not all those selected for a sample respond, (3) sampling error which results from the fact that surveys only collect data from a subset of the population being measured, and (4) measurement error. Measurement error can arise from the wording and design of survey questions (i.e., instrument error), as well as the variability in respondent ability and motivation (i.e., respondent error) [17].
This paper focuses primarily on measurement error as a source of bias in surveys. It is well established that instrument error [34, 40] and respondent error (e.g., [21]) can yield meaningful differences in results. For example, variations in response order, response scales, descriptive text, or images used in a survey can lead to instrument error which can result in skewed response distributions. Certain types of questions can trigger other instrument error biases, such as the tendency to agree with statements presented in an agree/disagree format (acquiescence bias) or the hesitancy to admit undesirable behaviors or overreport desirable behaviors (social desirability bias). Respondent error is largely related to the amount of cognitive effort required to answer a survey and arises when respondents are either unable or unwilling to exert the required effort [21].
Such measurement error has been compared across survey modes, such as face-to-face, telephone, and Internet (e.g., [9, 18]), but little work has compared different Internet samples, such as crowdsourcing task platforms (e.g., Amazon’s Mechanical Turk), paywall surveys (e.g., Google Consumer Surveys), opt-in panels (e.g., Survey Sampling International), and probability based panels (e.g., the Gfk KnowledgePanel). Because these samples differ in recruiting, context, and incentives, respondents may be more or less motivated to effortfully respond to questions, leading to different degrees of bias in different samples. The specific instruments deployed to respondents in these different modes can also exacerbate the situation by requiring more or less cognitive effort to answer satisfactorily.
The present study has two goals:
Investigate the impact of question wording on response distributions in order to measure the strength of common survey biases arising from instrument and respondent error
Compare the variance in the degree of these biases across Internet survey samples with differing characteristics in order to determine whether certain types of samples are more susceptible to certain biases than others.
This paper focuses primarily on measurement error as a source of bias in surveys. It is well established that instrument error [34, 40] and respondent error (e.g., [21]) can yield meaningful differences in results. For example, variations in response order, response scales, descriptive text, or images used in a survey can lead to instrument error which can result in skewed response distributions. Certain types of questions can trigger other instrument error biases, such as the tendency to agree with statements presented in an agree/disagree format (acquiescence bias) or the hesitancy to admit undesirable behaviors or overreport desirable behaviors (social desirability bias). Respondent error is largely related to the amount of cognitive effort required to answer a survey and arises when respondents are either unable or unwilling to exert the required effort [21].
Such measurement error has been compared across survey modes, such as face-to-face, telephone, and Internet (e.g., [9, 18]), but little work has compared different Internet samples, such as crowdsourcing task platforms (e.g., Amazon’s Mechanical Turk), paywall surveys (e.g., Google Consumer Surveys), opt-in panels (e.g., Survey Sampling International), and probability based panels (e.g., the Gfk KnowledgePanel). Because these samples differ in recruiting, context, and incentives, respondents may be more or less motivated to effortfully respond to questions, leading to different degrees of bias in different samples. The specific instruments deployed to respondents in these different modes can also exacerbate the situation by requiring more or less cognitive effort to answer satisfactorily.
The present study has two goals:
Investigate the impact of question wording on response distributions in order to measure the strength of common survey biases arising from instrument and respondent error
Compare the variance in the degree of these biases across Internet survey samples with differing characteristics in order to determine whether certain types of samples are more susceptible to certain biases than others.