Google Research

Detecting Response Scale Inconsistency in Real Time

74th Annual Conference of the American Association for Public Opinion Research (2019) (to appear)


Researchers often face the challenge of seemingly conflicting respondent answers to different questions about the same subject. Some respondents will give positive open-ended evaluations of a subject immediately after having provided a low rating for the same subject. In some proportion of these possibly confused cases, the culprit may be features of the survey design that are influencing respondent answers in unexpected ways.

This paper examines a research experiment where the scale was presented in 4 ways using a commonly-used 5 point, fully-labeled, bi-polar scale, with vertical orientation, positive on top; vertical with negative on top; horizontal orientation with positive on the left; and horizontal with positive on the right. We are looking at two groups of respondents,those who respond via desktop or mobile. Each respondent was assigned to two out of five unipolar and two out of five bipolar response scale. The questions asked about physical and mental health, financial situation & work satisfaction.

In total we used about 4,500 U.S. respondents from the online panel Survey Sampling International. We had about 450 respondents per each desktop condition and about 650 for each mobile condition (2 by 4 design). Mobile respondents tended to be younger, more female and slightly less educated than desktop respondents.

For each condition, a follow-up open-end question was asked on why the respondent gave the score that they gave. In real-time, right after the respondent completed the open-end response, the response was auto-coded by Google Cloud Natural language API AnalyzeSentiment web service on a scale of -1 to 1. Then, the sentiment and scale response were checked for inconsistency of the two responses from the respondent (scale answer was positive and open-end response was auto-coded as having negative sentiment, or scale answer was negative and open-end response was auto-coded as having positive sentiment). If inconsistency was identified we have the opportunity to the respondent who elected it to change one of their answers and the reasons why they chose to change their response.

Paradata logs such as time per question, number of clicks, and change of answers were also collected. In this work we assess two main questions: we want to identify which of the 4 scales provides more consistent responses, and the accuracy of the auto-coding of sentiment using manual coding.

The main findings were the following: Mobile respondents wanted to change response options more often than desktop respondents Answering a scale from the negative end almost always takes longer Higher inconsistency showed for mobile respondents Unipolar scales showed higher inconsistency overall than bipolar scales Unipolar question showed higher inconsistency for horizontal positive left & vertical negative top Bipolar scales showed higher inconsistency for vertically oriented scale When manually analyzed the coding of Google NLP on sentiment for the open ended answers, the quality was really good provided the amount of text written per open end.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work