Aaron Sedley
Aaron Sedley is a Staff User Experience Researcher at Google, focused on measuring users' attitudes via surveys. Aaron initiated Happiness Tracking Surveys (HaTS) at Google in 2006, a platform that measures attitudes in context of product usage, which is deployed across Google's products.
In addition to his leadership on HaTS, Aaron consults across Google on survey methodology, planning and implementation. Aaron also focused on Change Aversion at Google, establishing principles to minimize negative reactions when launching changes to a familiar product.
Prior to joining Google, Aaron held research positions with New York Times Digital, Young & Rubicam, and the Carnegie Endowment for International Peace. He earned a bachelor's degree in Government from Wesleyan University (CT).
Research Areas
Authored Publications
Google Publications
Other Publications
Sort By
Preview abstract
Survey communities have regularly discussed optimal questionnaire design for attitude measurement. Specifically for consumer satisfaction, which has historically been treated as a bipolar construct (Thurstone, 1931; Likert, 1932), some argue it is actually two separate unipolar constructs, which may yield signals with separable and interactive dynamics (Cacioppo & Berntson, 1994).
Earlier research has explored whether attitude measurement validity can be optimized with a branching design that involves two questions: a question about the direction of an attitude (e.g., positive, negative) followed by a question using a unipolar scale, about the intensity of the selected direction (Krosnick & Berent, 1993).
The current experiment evaluated differences across a variety of question designs for in-product contextual satisfaction surveys (Sedley & Müller, 2016). Specifically, we randomly assigned respondents into the following designs:
Traditional 5-point bipolar satisfaction scale (fully labeled)
Branched: a directional question (satisfied, neither satisfied nor dissatisfied, dissatisfied), followed by a unipolar question on intensity (5-point scale from “not at all” to “extremely,” fully labeled)
Unipolar satisfaction scale, followed by a unipolar dissatisfaction scale (both use 5-point scale from “not at all” to “extremely,” fully labeled)
Unipolar dissatisfaction scale, followed by a unipolar satisfaction scale both use 5-point scale from “not at all” to “extremely,” fully labeled)
The experiment adds to the attitude question design literature by evaluating designs based on criterion validity evidence; namely the relationship with user behaviors linked to survey responses.
Results show that no format clearly outperformed the ‘traditional’ bipolar scale format, for the criteria included. Separate unipolar scales performed poorly, and may be awkward or annoying for respondents. Branching, while performing similarly as the traditional bipolar design, showed no gain in validity. Thus, it is also not desirable because it requires two questions instead of one, increasing respondent burden.
REFERENCES
Cacioppo, J. T., & Berntson, G. G. (1994). Relationship between attitudes and evaluative space: A critical review, with emphasis on the separability of positive and negative substrates. Psychological bulletin, 115, 401-423.
Krosnick, J. A., & Berent, M. K. (1993). Comparisons of party identification and policy preferences: The impact of survey question format. American Journal of Political Science, 37, 941-964.
Reliability of responses via test-retest, comparing branched vs unbranched
Orthogonal to our study? Not a validity analysis
Malhotra, N., Krosnick, J. A., & Thomas, R. K. (2009). Optimal design of branching questions to measure bipolar constructs. Public Opinion Quarterly, 73), 304-324.
Looks like their analyses were within-condition, and not comparing single question versions to branched versions like we are
page 308 summarizes how they coded the variants and normalized 0 to 1 for regression analysis
O’Muircheartaigh, C., Gaskell, G., & Wright, D. B. (1995). Weighing anchors: Verbal and numeric labels for response scales. Journal of Official Statistics, 11, 295–308.
Wang, R., & Krosnick, J. A. (2020). Middle alternatives and measurement validity: a recommendation for survey researchers. International Journal of Social Research Methodology, 23, 169-184.
Thurstone, L. L. (1927). A law of comparative judgment. Psychological Review, 79, 281–299.
Thurstone, L. L. (1931). Rank order as a psychological method. Journal of Experimental Psychology, 14, 187–201.
Likert, R. (1932). A Technique for the Measurement of Attitudes. Archives of Psychology,
22, 5–55.
Sedley, A., & Müller, H. (2016, May). User experience considerations for contextual product surveys on smartphones. Paper presented at 71st annual conference of the American Association for Public Opinion Research, Austin, TX. Retrieved from https://ai.google/research/pubs/pub46422/
View details
“Mixture of amazement at the potential of this technology and concern about possible pitfalls”: Public sentiment towards AI in 15 countries
Patrick Gage Kelley
Christopher Moessner
Allison Woodruff
Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol. 44 (2021), pp. 28-46
Preview abstract
Public opinion plays an important role in the development of technology, influencing product adoption, commercial development, research funding, career choices, and regulation. In this paper we present results of an in-depth survey of public opinion of artificial intelligence (AI) conducted with over 17,000 respondents spanning fifteen countries and six continents. Our analysis of open-ended responses regarding sentiment towards AI revealed four key themes (exciting, useful, worrying, and futuristic) which appear to varying degrees in different countries. These sentiments, and their relative prevalence, may inform how the public influences the development of AI.
View details
Exciting, Useful, Worrying, Futuristic: Public Perception of Artificial Intelligence in 8 Countries
Patrick Gage Kelley
Christopher Moessner
Andreas Kramm
David T. Newman
Allison Woodruff
AIES '21: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (2021), 627–637
Preview abstract
As the influence and use of artificial intelligence (AI) have grown and its transformative potential has become more apparent, many questions have been raised regarding the economic, political, social, and ethical implications of its use. Public opinion plays an important role in these discussions, influencing product adoption, commercial development, research funding, and regulation. In this paper we present results of an in-depth survey of public opinion of artificial intelligence conducted with 10,005 respondents spanning eight countries and six continents. We report widespread perception that AI will have significant impact on society, accompanied by strong support for the responsible development and use of AI, and also characterize the public’s sentiment towards AI with four key themes (exciting, useful, worrying, and futuristic) whose prevalence distinguishes response to AI in different countries.
View details
Scaling the smileys: A multicountry investigation
Joseph M. Paxton
The Essential Role of Language in Survey Research, RTI Press (2020), pp. 231-242
Preview abstract
Contextual user experience (UX) surveys are brief surveys embedded in a
website or mobile app (Sedley & Müller, 2016). In these surveys, emojis (e.g.,
smiley faces, thumbs, stars), with or without text labels, are often used as
answer scales. Previous investigations in the United States found that
carefully designed smiley faces may distribute fairly evenly along a numerical
scale (0–100) for measuring satisfaction (Sedley, Yang, & Hutchinson, 2017).
The present study investigated the scaling properties and construct meaning
of smiley faces in six countries. We collected open-ended descriptions of
smileys to understand construct interpretations across countries. We also
assessed numeric meaning of a set of five smiley faces on a 0–100 range by
presenting each face independently, as well as in context with other faces with
and without endpoint text labels.
View details
Preview abstract
Contextual user experience (UX) surveys are brief surveys embedded in a website or mobile app and triggered during or after a user-product interaction. They are used to measure user attitude and experience in the context of actual product usage. In these surveys, smiley faces (with or without verbal labels) are often used as answer scales for questions measuring constructs such as satisfaction. From studies done in the US in 2016 and 2017, we found that carefully designed smiley faces may distribute fairly evenly along a numerical scale (0-100) and scaling property further improved with endpoint verbal labels (Sedley, Yang, & Hutchinson, presented at APPOR 2017).
With the propagation of mobile apps products around the world, the survey research community is compelled to test the generalizability of single-population findings (often from the US) to cross-national, cross-language and cross-cultural contexts.
The current study builds upon the above scaling study as well as work by cross-cultural survey methodologies that investigated meanings of verbal scales (e.g., Smith, Mohler, Harkness, & Onodera, 2005). We investigate the scaling properties of smiley faces in a number of distinct cultural and language settings: US (English), Japan (Japanese), Germany (German), Spain (Spanish), India (English), and Brazil (Portuguese).
Specifically, we explore construct alignment by capturing respondents’ own interpretations of the smiley face variants, via open-ended responses.
We also assess scaling properties of various smiley designs by measuring each smiley face on a 0-100 scale, to calculate semantic distance between smileys. This is done by both presenting each smiley face independently and in-context with other smileys. We additionally evaluate the effect of including verbal endpoint labels with smiley scale.
View details
Preview abstract
With increased adoption and usage of mobile apps for a variety of purposes, it is important to establish attitudinal measurement designs to measure users’ experiences in context of actual app usage. Such designs should balance mobile UX considerations with survey data quality.
To inform choices on contextual mobile survey design, we conduct a comparative evaluation of stars vs smileys as graphical scales for in-context mobile app satisfaction measurement, as follows:
To evaluate and compare data quality across scale types, we look at the distributions of the numerical ratings by anchor point stimulus to evaluate the extremity and scale point distances. We also assess criterion validity for stars and smileys, where feasible.
To evaluate User Experience across variants, we compare key survey-related signals such as response & dismiss rates, dismiss/response ratio, and time-to-response.
View details
To Smiley, Or Not To Smiley? Considerations and Experimentation to Optimize Data Quality and User Experience for Contextual Product Satisfaction Measurement?
https://docs.google.com/a/google.com/presentation/d/e/2PACX-1vQMmPQ6xeyUWbA_tey23GiXJ8SUdZWn8FiL5E5x7BGrKOLe7Im8UnXOfRxBkFB0OYo_7ioovOpVztB1/pub?start=false&loop=false&delayms=5000 (2017)
Preview abstract
Happiness Tracking Surveys (HaTS) at Google are designed to measure satisfaction with a product or feature in context of actual usage. Smiley faces have been added to a fully-labeled satisfaction scale, to increase discoverability of the survey and response rates. Sensitive to the potential variety of effects from images and visual presentation in online surveys (Tourangeau, Conrad & Couper, 2013), this presentation will describe research designed to inform and optimize Google's use of smileys in Happiness Tracking Surveys across products and platforms:
1) We explore construct alignment by capturing users' interpretations of the various smiley faces, via open-ended responses. This data shows meaningful variation across potential smiley images, which informed design decisions.
2) We assess scaling properties of smileys by measuring each smiley independently on a 0-100 scale, to calculate semantic distance between smileys in order to achieve equally-spaced intervals between scale points (Klockars & Yamagishi, 1988).
3) We describe considerations and evaluative metrics for a smiley-based scale with endpoint text labels, to be used with mobile apps and devices.
View details
Preview abstract
As use of smartphone-based tools & services broadens and deepens, such products should be continuously evaluated and optimized to meet users' needs. In-product surveys are one way to gather attitudinal and user experience data at scale, in context of actual experiences. However, given space constraints, OS variance, product design differences, and app vs mobile web options, launching contextual surveys on smartphones requires considerable attention to several user experience aspects that can also impact data quality. In this talk, we will discuss a variety of practical UX considerations for in-context surveys on smartphones, drawing on real-world implementation and experimentation for several of Google's most-used mobile products. In particular, we'll discuss issues such as when to trigger a survey, sampling across platforms (mobile vs desktop), invitations vs inline questions, survey length, question types, device size, screen orientation, survey interaction with the host product/app. We will also explore the effect of design and text variants on smartphone survey response rates and response distributions.
View details
Designing Surveys for HCI Research
CHI '15 Extended Abstracts on Human Factors in Computing Systems, ACM, New York, NY, USA (2015), pp. 2485-2486
Preview abstract
Online surveys are widely used in human-computer interaction (HCI) to gather feedback and measure satisfaction; at a glance many tools are available and the cost of conducting surveys appears low. However, there is a wide gap between quick-and-dirty surveys, and surveys that are properly planned, constructed, and analyzed. This course examines survey research approaches that meet HCI goals, selecting the appropriate sampling method, questionnaire design best practices, identifying and avoiding common survey biases, and questionnaire evaluation. Attendees will gain an appreciation for the breadth and depth of surveys in HCI, combined with keys to conducting valid, reliable, and impactful survey research themselves.
View details
Perceived Frequency of Advertising Practices
Allen Collins
Allison Woodruff
Symposium on Usable Privacy and Security (SOUPS), Privacy Personas and Segmentation Workshop, Usenix (2015)
Preview abstract
In this paper, we introduce a new construct for measuring individuals’ privacy-related beliefs and understandings, namely their perception of the frequency with which information about individuals is gathered and used by others for advertising purposes. We introduce a preliminary instrument for measuring this perception, called the Ad Practice Frequency Perception Scale. We report data from a survey using this instrument, as well as the results of an initial clustering of participants based on this data. Our results, while preliminary, suggest that this construct may have future potential to characterize and segment individuals, and is worthy of further exploration.
View details
A Comparison of Questionnaire Biases Across Sample Providers
Victoria Sosik
American Association for Public Opinion Research, 2015 Annual Conference (2015)
Preview abstract
Survey research, like all methods, is fraught with potential sources of error that can significantly affect the validity and reliability of results. There are four major types of error common to surveys as a data collection method: (1) coverage error arising from certain segments of a target population being excluded, (2) nonresponse error where not all those selected for a sample respond, (3) sampling error which results from the fact that surveys only collect data from a subset of the population being measured, and (4) measurement error. Measurement error can arise from the wording and design of survey questions (i.e., instrument error), as well as the variability in respondent ability and motivation (i.e., respondent error) [17].
This paper focuses primarily on measurement error as a source of bias in surveys. It is well established that instrument error [34, 40] and respondent error (e.g., [21]) can yield meaningful differences in results. For example, variations in response order, response scales, descriptive text, or images used in a survey can lead to instrument error which can result in skewed response distributions. Certain types of questions can trigger other instrument error biases, such as the tendency to agree with statements presented in an agree/disagree format (acquiescence bias) or the hesitancy to admit undesirable behaviors or overreport desirable behaviors (social desirability bias). Respondent error is largely related to the amount of cognitive effort required to answer a survey and arises when respondents are either unable or unwilling to exert the required effort [21].
Such measurement error has been compared across survey modes, such as face-to-face, telephone, and Internet (e.g., [9, 18]), but little work has compared different Internet samples, such as crowdsourcing task platforms (e.g., Amazon’s Mechanical Turk), paywall surveys (e.g., Google Consumer Surveys), opt-in panels (e.g., Survey Sampling International), and probability based panels (e.g., the Gfk KnowledgePanel). Because these samples differ in recruiting, context, and incentives, respondents may be more or less motivated to effortfully respond to questions, leading to different degrees of bias in different samples. The specific instruments deployed to respondents in these different modes can also exacerbate the situation by requiring more or less cognitive effort to answer satisfactorily.
The present study has two goals:
Investigate the impact of question wording on response distributions in order to measure the strength of common survey biases arising from instrument and respondent error
Compare the variance in the degree of these biases across Internet survey samples with differing characteristics in order to determine whether certain types of samples are more susceptible to certain biases than others.
View details
Survey Research in HCI
Elizabeth Ferrall-Nunge
Ways of Knowing in HCI, Springer, New York, NY, USA (2014), Survey Research in HCI
Preview abstract
Surveys, now commonplace on the Internet, allow researchers to make inferences about an entire population by gathering information from a small subset of the larger group. Surveys can gather insights about people’s attitudes, perceptions, intents, habits, awarenesses, experiences, and characteristics, at significant moments both in time and over time. Even though they are easy to administer, there is a wide gap between quick-and-dirty surveys and surveys that are properly planned, constructed, and analyzed.
View details
Online Microsurveys for User Experience Research
Preview
Victoria Schwanda Sosik
Gueorgi Kossinets
Kerwell Liao
Paul McDonald
CHI '14 Extended Abstracts on Human Factors in Computing Systems (2014)
HaTS: Large-scale In-product Measurement of User Attitudes & Experiences with Happiness Tracking Surveys
Proceedings of the 26th Australian Computer-Human Interaction Conference (OzCHI 2014), ACM, New York, NY, USA, pp. 308-315
Preview abstract
With the rise of Web-based applications, it is both important and feasible for human-computer interaction practitioners to measure a product’s user experience. While quantifying user attitudes at a small scale has been heavily studied, in this industry case study, we detail best Happiness Tracking Surveys (HaTS) for collecting attitudinal data at a large scale directly in the product and over time. This method was developed at Google to track attitudes and open-ended feedback over time, and to characterize products’ user bases. This case study of HaTS goes beyond the design of the questionnaire to also suggest best practices for appropriate sampling, invitation techniques, and its data analysis. HaTS has been deployed successfully across dozens of Google’s products to measure progress towards product goals and to inform product decisions; its sensitivity to product changes has been demonstrated widely. We are confident that teams in other organizations will be able to embrace HaTS as well, and, if necessary, adapt it for their unique needs.
View details
Designing Unbiased Surveys for HCI Research
Elizabeth Ferrall-Nunge
CHI '14 Extended Abstracts on Human Factors in Computing Systems, ACM, New York, NY, USA (2014), pp. 1027-1028
Preview abstract
Surveys are a commonly used method within HCI research. While it initially appears easy and inexpensive to conduct surveys, overlooking key considerations in questionnaire design and the survey research process can yield skewed, biased, or entirely invalid survey results. Fortunately decades of academic research and analysis exist on optimizing the validity and reliability of survey data, from which this course will draw. To enable the creation of unbiased surveys, this course demonstrates questionnaire design biases and pitfalls, provides best practices for minimizing these, and reviews different uses of surveys within HCI.
View details
A Comparison of Six Sample Providers Regarding Online Privacy Benchmarks
Sebastian Schnorf
Allison Woodruff
SOUPS Workshop on Privacy Personas and Segmentation (2014)
Preview abstract
Researchers increasingly utilize online tools to gather insights.
We show how privacy comfort as measured by questionnaires
differs across various survey sample providers. To investigate
potential differences depending on provider, we fielded a small set
of privacy-related benchmark questions regarding past experience,
present and future concerns to six major US survey providers. We
found substantial differences depending on privacy benchmark
and provider population, illustrating that privacy-related research
may yield different insights depending on provider choice.
View details
Minimizing change aversion for the Google Drive launch
CHI'13 Extended Abstracts on Human Factors in Computing Systems, ACM, New York, NY, USA (2013), pp. 2351-2354
Preview abstract
Change aversion is a natural response, which technology often exacerbates. Evolutionary changes can be subtle and occur over many generations. But Internet users must sometimes deal with sudden, significant product changes to applications they rely on and identify with. Despite the best intentions of designers and product managers, users often experience anxiety and confusion when faced with a new interface or changed functionality. While some change aversion is often inevitable, it can also be managed and minimized with the right steps. This case study describes how our understanding of change aversion helped minimize negative effects for the transition of the Google Docs List to Google Drive, a product for file storage in the cloud. We describe actions that allowed for a launch with no aversion.
View details
Are privacy concerns a turn-off? Engagement and privacy in social networks
Jessica Staddon
Larkin Brown
Symposium on Usable Privacy and Security (SOUPS), ACM (2012) (to appear)
Preview abstract
We describe the survey results from a representative sample of 1,075 U.S. social network users who use Facebook as their primary network. Our results show a strong association between low engagement and privacy concern. Specifically, users who report concerns around sharing control, comprehension of sharing practices or general Facebook privacy concern, also report consistently less time spent as well as less (self-reported) posting, commenting and “Like”ing of content. The limited evidence of other significant differences between engaged users and others suggests that privacy-related concerns may be an important gate to engagement. Indeed, privacy concern and network size are the only malleable attributes that we find to have significant association with engagement. We manually categorize the privacy concerns finding that many are nonspecific and not associated with negative personal experiences. Finally, we identify some education and utility issues associated with low social network activity, suggesting avenues for increasing engagement amongst current users.
View details
Preview abstract
When designing online surveys, researchers must choose from a variety of pagination options. Respondents' expectations, experiences, and behaviors may vary depending on a survey's pagination, affecting both breakoffs and responses themselves. Surprisingly little formal experimentation has been conducted on the effects of survey pagination, with initial evidence focused on a long survey of university
students (Peytchev, Couper, McCabe, Crawford 2006). This experiment is intended to further inform the effects on pagination in online surveys. In a split-ballot experiment, we served respondents one of three versions of a short online questionnaire (~15 questions) on attitudes and experiences toward an online product. questionnaire are randomly served to respondents constructed with a) one question per page, b) logical groupings of questions over several pages, and c) as few pages as possible. Effects of pagination are evaluated on breakoff rates, response time, item and unit nonresponse, interitem correlations, and perceived length/difficulty. We hypothesize that the questionnaire with the fewest (longest) pages will cause greater initial breakoff, and the one with most pages will suffer increased breakoff during the survey.
View details
No Results Found