Jump to Content
Yongwei Yang

Yongwei Yang

Yongwei Yang is a researcher at Google. He works on (1) user and consumer research, (2) public perceptions about AI, (3) integrating AI into research methods and processes, and (4) attitude-behavior linkage and its implication to business goal-setting and impact evaluation. Yongwei also works on foundational methodological research on collecting better data and making better use of data, esp. with surveys, psychological measurement, and behavioral signals. He is passionate about using his expertise to create a positive impact and to help others become effective users of research. Yongwei holds a Ph.D. in Quantitative and Psychometric Methods from the University of Nebraska-Lincoln.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract It is a common practice in market research to set up cross sectional survey trackers. Although many studies have investigated the accuracy of non-probability-based online samples, less is known about their test-retest reliability which is of key importance for such trackers. In this study, we wanted to assess how stable measurement is over short periods of time so that any changes observed over long periods in survey trackers could be attributed to true changes in sentiment rather than sample artifacts. To achieve this, we repeated the same 10-question survey of 1,500 respondents two weeks apart in four different U.S. non-probability-based samples. The samples included: Qualtrics panels representing a typical non-probability-based online panel, Google Surveys representing a river sampling approach, Google Opinion Rewards representing a mobile panel, and Amazon MTurk, not a survey panel in itself but de facto used as such in academic research. To quantify test-retest reliability, we compared the response distributions from the two survey administrations. Given the attitudes measured were not expected to change in a short timespan and no relevant external events were reported during fielding to potentially affect the attitudes, the assumption was that the two measurements should be very close to each other, aside from transient measurement error. We found two of the samples produced remarkably consistent results between the two survey administrations, one sample was less consistent, and the fourth sample had significantly different response distributions for three of the four attitudinal questions. This study sheds light on the suitability of different non-probability-based samples for cross sectional attitude tracking. It is a common practice in market research to set up cross sectional survey trackers. Although many studies have investigated the accuracy of non-probability-based online samples, less is known about their test-retest reliability which is of key importance for such trackers. In this study, we wanted to assess how stable measurement is over short periods of time so that any changes observed over long periods in survey trackers could be attributed to true changes in sentiment rather than sample artifacts. To achieve this, we repeated the same 10-question survey of 1,500 respondents two weeks apart in four different U.S. non-probability-based samples. The samples included: Qualtrics panels representing a typical non-probability-based online panel, Google Surveys representing a river sampling approach, Google Opinion Rewards representing a mobile panel, and Amazon MTurk, not a survey panel in itself but de facto used as such in academic research. To quantify test-retest reliability, we compared the response distributions from the two survey administrations. Given the attitudes measured were not expected to change in a short timespan and no relevant external events were reported during fielding to potentially affect the attitudes, the assumption was that the two measurements should be very close to each other, aside from transient measurement error. We found two of the samples produced remarkably consistent results between the two survey administrations, one sample was less consistent, and the fourth sample had significantly different response distributions for three of the four attitudinal questions. This study sheds light on the suitability of different non-probability-based samples for cross sectional attitude tracking. View details
    Preview abstract Survey communities have regularly discussed optimal questionnaire design for attitude measurement. Specifically for consumer satisfaction, which has historically been treated as a bipolar construct (Thurstone, 1931; Likert, 1932), some argue it is actually two separate unipolar constructs, which may yield signals with separable and interactive dynamics (Cacioppo & Berntson, 1994). Earlier research has explored whether attitude measurement validity can be optimized with a branching design that involves two questions: a question about the direction of an attitude (e.g., positive, negative) followed by a question using a unipolar scale, about the intensity of the selected direction (Krosnick & Berent, 1993). The current experiment evaluated differences across a variety of question designs for in-product contextual satisfaction surveys (Sedley & Müller, 2016). Specifically, we randomly assigned respondents into the following designs: Traditional 5-point bipolar satisfaction scale (fully labeled) Branched: a directional question (satisfied, neither satisfied nor dissatisfied, dissatisfied), followed by a unipolar question on intensity (5-point scale from “not at all” to “extremely,” fully labeled) Unipolar satisfaction scale, followed by a unipolar dissatisfaction scale (both use 5-point scale from “not at all” to “extremely,” fully labeled) Unipolar dissatisfaction scale, followed by a unipolar satisfaction scale both use 5-point scale from “not at all” to “extremely,” fully labeled) The experiment adds to the attitude question design literature by evaluating designs based on criterion validity evidence; namely the relationship with user behaviors linked to survey responses. Results show that no format clearly outperformed the ‘traditional’ bipolar scale format, for the criteria included. Separate unipolar scales performed poorly, and may be awkward or annoying for respondents. Branching, while performing similarly as the traditional bipolar design, showed no gain in validity. Thus, it is also not desirable because it requires two questions instead of one, increasing respondent burden. REFERENCES Cacioppo, J. T., & Berntson, G. G. (1994). Relationship between attitudes and evaluative space: A critical review, with emphasis on the separability of positive and negative substrates. Psychological bulletin, 115, 401-423. Krosnick, J. A., & Berent, M. K. (1993). Comparisons of party identification and policy preferences: The impact of survey question format. American Journal of Political Science, 37, 941-964. Reliability of responses via test-retest, comparing branched vs unbranched Orthogonal to our study? Not a validity analysis Malhotra, N., Krosnick, J. A., & Thomas, R. K. (2009). Optimal design of branching questions to measure bipolar constructs. Public Opinion Quarterly, 73), 304-324. Looks like their analyses were within-condition, and not comparing single question versions to branched versions like we are page 308 summarizes how they coded the variants and normalized 0 to 1 for regression analysis O’Muircheartaigh, C., Gaskell, G., & Wright, D. B. (1995). Weighing anchors: Verbal and numeric labels for response scales. Journal of Official Statistics, 11, 295–308. Wang, R., & Krosnick, J. A. (2020). Middle alternatives and measurement validity: a recommendation for survey researchers. International Journal of Social Research Methodology, 23, 169-184. Thurstone, L. L. (1927). A law of comparative judgment. Psychological Review, 79, 281–299. Thurstone, L. L. (1931). Rank order as a psychological method. Journal of Experimental Psychology, 14, 187–201. Likert, R. (1932). A Technique for the Measurement of Attitudes. Archives of Psychology, 22, 5–55. Sedley, A., & Müller, H. (2016, May). User experience considerations for contextual product surveys on smartphones. Paper presented at 71st annual conference of the American Association for Public Opinion Research, Austin, TX. Retrieved from https://ai.google/research/pubs/pub46422/ View details
    “Mixture of amazement at the potential of this technology and concern about possible pitfalls”: Public sentiment towards AI in 15 countries
    Patrick Gage Kelley
    Christopher Moessner
    Aaron M Sedley
    Allison Woodruff
    Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol. 44 (2021), pp. 28-46
    Preview abstract Public opinion plays an important role in the development of technology, influencing product adoption, commercial development, research funding, career choices, and regulation. In this paper we present results of an in-depth survey of public opinion of artificial intelligence (AI) conducted with over 17,000 respondents spanning fifteen countries and six continents. Our analysis of open-ended responses regarding sentiment towards AI revealed four key themes (exciting, useful, worrying, and futuristic) which appear to varying degrees in different countries. These sentiments, and their relative prevalence, may inform how the public influences the development of AI. View details
    Exciting, Useful, Worrying, Futuristic: Public Perception of Artificial Intelligence in 8 Countries
    Patrick Gage Kelley
    Christopher Moessner
    Aaron Sedley
    Andreas Kramm
    David T. Newman
    Allison Woodruff
    AIES '21: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (2021), 627–637
    Preview abstract As the influence and use of artificial intelligence (AI) have grown and its transformative potential has become more apparent, many questions have been raised regarding the economic, political, social, and ethical implications of its use. Public opinion plays an important role in these discussions, influencing product adoption, commercial development, research funding, and regulation. In this paper we present results of an in-depth survey of public opinion of artificial intelligence conducted with 10,005 respondents spanning eight countries and six continents. We report widespread perception that AI will have significant impact on society, accompanied by strong support for the responsible development and use of AI, and also characterize the public’s sentiment towards AI with four key themes (exciting, useful, worrying, and futuristic) whose prevalence distinguishes response to AI in different countries. View details
    Scaling the smileys: A multicountry investigation
    Aaron Sedley
    Joseph M. Paxton
    The Essential Role of Language in Survey Research, RTI Press (2020), pp. 231-242
    Preview abstract Contextual user experience (UX) surveys are brief surveys embedded in a website or mobile app (Sedley & Müller, 2016). In these surveys, emojis (e.g., smiley faces, thumbs, stars), with or without text labels, are often used as answer scales. Previous investigations in the United States found that carefully designed smiley faces may distribute fairly evenly along a numerical scale (0–100) for measuring satisfaction (Sedley, Yang, & Hutchinson, 2017). The present study investigated the scaling properties and construct meaning of smiley faces in six countries. We collected open-ended descriptions of smileys to understand construct interpretations across countries. We also assessed numeric meaning of a set of five smiley faces on a 0–100 range by presenting each face independently, as well as in context with other faces with and without endpoint text labels. View details
    Assessing the validity of inferences from scores on the cognitive reflection test
    Nikki Blacksmith
    Tara S. Behrend
    Gregory A. Ruark
    Journal of Behavioral Decision Making, vol. 32 (2019), pp. 599-612
    Preview abstract Decision‐making researchers purport that a novel cognitive ability construct, cognitive reflection, explains variance in intuitive thinking processes that traditional mental ability constructs do not. However, researchers have questioned the validity of the primary measure because of poor construct conceptualization and lack of validity studies. Prior studies have not adequately aligned the analytical techniques with the theoretical basis of the construct, dual‐processing theory of reasoning. The present study assessed the validity of inferences drawn from the cognitive reflection test (CRT) scores. We analyzed response processes with an item response tree model, a method that aligns with the dual‐processing theory in order to interpret CRT scores. Findings indicate that the intuitive and reflective factors that the test purportedly measures were indistinguishable. Exploratory, post hoc analyses demonstrate that CRT scores are most likely capturing mental abilities. We suggest that future researchers recognize and distinguish between individual differences in cognitive abilities and cognitive processes. View details
    Response Option Order Effects in Cross-Cultural Context. An experimental investigation
    Rich Timpone
    Marni Hirschorn
    Vlad Achimescu
    Maribeth Natchez
    2019 Conference of the European Association for Survey Research (ESRA), Zagreb (2019) (to appear)
    Preview abstract Response option order effect occurs when different orders of rating scale response options lead to different distribution or functioning of survey questions. Theoretical interpretations, notably satisficing, memory bias (Krosnick & Alwin, 1987) and anchor-and-adjustment (Yan & Keusch, 2015) have been used to explain such effects. Visual interpretive heuristics (esp. “left-and-top-mean-first” and “up-means-good”) may also provide insights on how positioning of response options may affect answers (Tourangeau, Couper, & Conrad, 2004, 2013). Most existing studies that investigated the response option order effect were conducted in mono-cultural settings. However, the presence and extent of response option order effect may be affected by “cultural” factors in a few ways. First, interpretive heuristics, such as “left-means-first” may work differently due to varying reading conventions (e.g., left-to-right vs. right-to-left). Furthermore, people within cultures where there are multiple primary languages and multiple reading conventions might possess different positioning heuristics. Finally, respondents from different countries may have varying degree of exposure and familiarity to a specific type of visual design. In this experimental study, we investigate rating scale response option order effect across three countries with different reading conventions and industry norms for answer scale designs -- US, Israel, Japan. The between-subject factor of the experiment consists of four combinations of scale orientation (vertical and horizontal) and the positioning of the positive end of the scale. The within-subject factors are question topic area and the number of scale points. The effects of device (smartphone vs. desktop computer/tablet), age, gender, education, and the degree of exposure to left-to-right contents will also be evaluated. We incorporate a range of analytical approaches: distributional comparisons, analysis of response latency and paradata, and latent structure modeling. We will discuss implications on choosing response option orders for mobile surveys and on comparing data obtained from different response option orders. View details
    Preview abstract Contextual user experience (UX) surveys are brief surveys embedded in a website or mobile app and triggered during or after a user-product interaction. They are used to measure user attitude and experience in the context of actual product usage. In these surveys, smiley faces (with or without verbal labels) are often used as answer scales for questions measuring constructs such as satisfaction. From studies done in the US in 2016 and 2017, we found that carefully designed smiley faces may distribute fairly evenly along a numerical scale (0-100) and scaling property further improved with endpoint verbal labels (Sedley, Yang, & Hutchinson, presented at APPOR 2017). With the propagation of mobile apps products around the world, the survey research community is compelled to test the generalizability of single-population findings (often from the US) to cross-national, cross-language and cross-cultural contexts. The current study builds upon the above scaling study as well as work by cross-cultural survey methodologies that investigated meanings of verbal scales (e.g., Smith, Mohler, Harkness, & Onodera, 2005). We investigate the scaling properties of smiley faces in a number of distinct cultural and language settings: US (English), Japan (Japanese), Germany (German), Spain (Spanish), India (English), and Brazil (Portuguese). Specifically, we explore construct alignment by capturing respondents’ own interpretations of the smiley face variants, via open-ended responses. We also assess scaling properties of various smiley designs by measuring each smiley face on a 0-100 scale, to calculate semantic distance between smileys. This is done by both presenting each smiley face independently and in-context with other smileys. We additionally evaluate the effect of including verbal endpoint labels with smiley scale. View details
    From Big Data to Big Analytics: Automated Analytic Platforms for Data Exploration
    Jonathan Kroening
    Rich Timpone
    BigSurv 18 (Big Data Meet Survey Science) conference, Barcelona, Spain (2018)
    Preview abstract As Big Data has altered the face of research, the same factors of Volume, Velocity and Variety used to define it, are changing the opportunities of analytic data exploration as well; thus, the introduction of the term Big Analytics. Improvement in algorithms and computing power provide the foundation to produce automated platforms that can identify patterns in analytic model results beyond simply looking at the patterns in the data itself. Introducing the class of Automated Analysis Insight Exploration Platforms allows conducting tens and hundreds of thousands of statistical models to explore them to identify systematic changes in dynamic environments that would often be missed otherwise. These techniques are designed to extract more value out of both traditional survey as well as Big Data, and is relevant for academic, industry, governmental and NGO exploration of new insights of changing patterns of attitudes and behaviors. This paper discusses the architecture of our Ipsos Research Insight Scout (IRIS) and then provides examples of it in action to identify insights for scientific and practical discovery in public opinion and business data. From the Ipsos Global Advisor Study we show examples from the U.S. withdrawal from the Paris Agreement and the 2016 presidential election. We then show with an example how a research project at Google is leveraging these platforms to inform business decision-making. View details
    Preview abstract With increased adoption and usage of mobile apps for a variety of purposes, it is important to establish attitudinal measurement designs to measure users’ experiences in context of actual app usage. Such designs should balance mobile UX considerations with survey data quality. To inform choices on contextual mobile survey design, we conduct a comparative evaluation of stars vs smileys as graphical scales for in-context mobile app satisfaction measurement, as follows: To evaluate and compare data quality across scale types, we look at the distributions of the numerical ratings by anchor point stimulus to evaluate the extremity and scale point distances. We also assess criterion validity for stars and smileys, where feasible. To evaluate User Experience across variants, we compare key survey-related signals such as response & dismiss rates, dismiss/response ratio, and time-to-response. View details
    Justice Rising - The Growing Ethical Importance of Big Data, Survey Data, Models and AI
    Rich Timpone
    BigSurv 18 (Big Data Meet Survey Science) conference, Barcelona, Spain (2018)
    Preview abstract In past work, the criteria of Truth, Beauty, and Justice have been leveraged to evaluate models (Lave and March 1993, Taber and Timpone 1996). Earlier, while relevant, Justice was seen as the least important of modeling considerations, but that is no longer the case. As the nature of data and computing power have opened new opportunities for the application of data and algorithms from public policy decision-making to technological advances like self-driving cars, the ethical considerations have become far more important in the work that researchers are doing. While a growing literature has been highlighting ethical concerns of Big Data, algorithms and artificial intelligence, we take a practical approach of reviewing how decisions throughout the research process can result in unintended consequences in practice. Building off Gawande’s (2009) approach of using checklists to reduce risks, we have developed an initial framework and set of checklist questions for researchers to consider the ethical implications of their analytic endeavors explicitly. While many aspects are considered those tied to Truth and accuracy, through our examples it will be seen that considering research design through the lens of Justice may lead to different research choices. These checklists include questions on the collection of data (Big Data and Survey; including sources and measurement), how it is modeled and finally issues of transparency. These issues are of growing importance for practitioners from academia to industry to government and will allow us to advance the intended goals of our scientific and practical endeavors while avoiding potential risks and pitfalls. View details
    The Role of Surveys in the Era of “Big Data”
    The Palgrave handbook of Survey Research, Palgrave (2018), pp. 175-192
    Preview abstract Survey data have recently been compared and contrasted with so-called “Big Data” and some observers have speculated about how Big Data may eliminate the need for survey research. While both Big Data and survey research have a lot to offer, very little work has examined the ways that they may best be used together to provide richer datasets. This chapter offers a broad definition of Big Data and proposes a framework for understanding how the benefits and error properties of Big Data and surveys may be leveraged in ways that are complementary. This chapter presents several of the opportunities and challenges that may be faced by those attempting to bring these different sources of data together. View details
    To Smiley, Or Not To Smiley? Considerations and Experimentation to Optimize Data Quality and User Experience for Contextual Product Satisfaction Measurement?
    Aaron Sedley
    https://docs.google.com/a/google.com/presentation/d/e/2PACX-1vQMmPQ6xeyUWbA_tey23GiXJ8SUdZWn8FiL5E5x7BGrKOLe7Im8UnXOfRxBkFB0OYo_7ioovOpVztB1/pub?start=false&loop=false&delayms=5000 (2017)
    Preview abstract Happiness Tracking Surveys (HaTS) at Google are designed to measure satisfaction with a product or feature in context of actual usage. Smiley faces have been added to a fully-labeled satisfaction scale, to increase discoverability of the survey and response rates. Sensitive to the potential variety of effects from images and visual presentation in online surveys (Tourangeau, Conrad & Couper, 2013), this presentation will describe research designed to inform and optimize Google's use of smileys in Happiness Tracking Surveys across products and platforms: 1) We explore construct alignment by capturing users' interpretations of the various smiley faces, via open-ended responses. This data shows meaningful variation across potential smiley images, which informed design decisions. 2) We assess scaling properties of smileys by measuring each smiley independently on a 0-100 scale, to calculate semantic distance between smileys in order to achieve equally-spaced intervals between scale points (Klockars & Yamagishi, 1988). 3) We describe considerations and evaluative metrics for a smiley-based scale with endpoint text labels, to be used with mobile apps and devices. View details
    Preview abstract Grids (or matrix, table) are commonly used on self-administered surveys. In order to optimize online surveys for smartphones, grid designs aiming for small-screen devices are emerging. In this study we investigate four research questions regarding the effectiveness and drawbacks of different grid designs, more specifically do the grid design effect: Data quality, as indicated by breakoffs, satisfying behaviors and response errors Response time Response distributions Inter-relationships among questions We conducted two experiments. The first experiment was conducted in April 2016 in Brazil, US and Germany. We tested a progressive grid, a responsive and a collapsable grid. Results were analyzed for desktop/laptops only due to the small number of respondents who took the study via smartphones. We found the collapsable grid eliciting the highest amount of error prompts for item nonresponse. The second experiment was fielded in August 2016 testing grid designs on three types of answer scales: a 7-point fully-labeled rating scale, a 5-point fully-labeled rating scale, and a 6-point fully-labeled frequency scale. Respondents from the US and Japan to an online survey were randomly assigned to one of three conditions: (a) no grid, where each question was presented on a separate screen; (b) responsive grid, where a grid is shown on large screens and as single-column vertical table on small screens (with question stem fixed as header); (c) progressive grid, where grouped questions were presented screen-by-screen with question stem and sub-questions (stubs) fixed on top. Quotas were enforced so that half of the respondents completed the survey on large-screen devices (desktop/tablet computers) and the other half on smartphones. Respondents were 600 per grid condition per screen size per country. Findings showed that progressive grid had less straightlining and response errors whereas responsive grid had less break-offs. Differences were also found between grid designs in terms of response time and response distributions; however patterns varied by country, screen size and answer scales. Further analysis will explore the effect of grid design on question inter-relationships. While visual and interactive features impact the utility of grid designs, we found that the effects might vary by question types, screen sizes, and countries. More experiments are needed to explore designs truly optimized for online surveys. View details
    Preview abstract Survey research is increasingly conducted using online panels and river samples. With a large number of data suppliers available, data purchasers need to understand the accuracy of the data being provided and whether probability sampling continues to yield more accurate measurements of populations. This paper evaluates the accuracy of a probability sample and non-­probability survey samples that were created using various different quota sampling strategies and sample sources (panel versus river samples) on the accuracy of estimates. Data collection was organized by the Advertising Research Foundation (ARF) in 2013. We compare estimates from 45 U.S. online panels of non-­probability samples, 6 river samples, and one RDD telephone sample to high-­quality benchmarks ­­ population estimates obtained from large-­scale face-­to-­face surveys of probability samples with extremely high response rates (e.g., ACS, NHIS, and NHANES). The non-probability samples were supplied by 17 major U.S. providers. Online respondents were directed to a third party website where the same questionnaire was administered. The online samples were created using three quota methods: (A) age and gender within regions; (B) Method A plus race/ethnicity; and (C) Method B plus education. Mean questionnaire completion time was 26 minutes, and the average sample size was 1,118. Comparisons are made using unweighted and weighted data, with different weighting strategies of increasing complexity. Accuracy is evaluated using the absolute average error method, where the percentage of respondents who chose the modal category in the benchmark survey is compared to the corresponding percentage in each sample. The study illustrates the need for methodological rigor when evaluating the performance of survey samples. View details
    Transportability: Boundaries, Challenges, Standards
    Theodore L. Hayes
    Cheryl Fernandez
    Kurt F. Geisinger
    Nancy T. Tippins
    (2015)
    Adjusting survey mode differences: Illustration of a linear equating method
    Sangeeta Agrawal
    2015 American Statistical Association Conference on Statistical Practice (CSP)
    Measurement equivalence of a concise customer engagement metrics across country, language, and customer types
    Dan Yu
    Public Opinion Quarterly, vol. 79 (2015), pp. 325-358
    Targeted recruitment: Exploring job preferences across applicant profiles
    Nikki Blacksmith
    Nate Dvorak
    29th annual Society for Industrial and Organizational Psychology conference (2014)
    Assessments (truly) enhanced by technology: Rationale, validity, and value
    Stephen G. Sireci
    Theodore L. Hayes
    28th annual Society for Industrial and Organizational Psychology conference (2013)
    Attracting high quality nurses: Both message and channel matter
    Nikki Blacksmith
    Cheryl Fernandez
    28th annual Society for Industrial and Organizational Psychology conference (2013)
    “Big Data” technology: Problem or Solution?
    Tiffany Poeppelman
    Nikki Blacksmith
    The Industrial and Organizational Psychologists, vol. 51(2) (2013), pp. 119-126
    Current best practices in item selection
    Tzuyun Chin
    Anja Römhild
    8th conference of the International Test Commission (2012)
    International generalizability of organizational surveys
    Sangeeta Agrawal
    James K. Harter
    Fourth International Conference on Establishment Surveys (2012)
    Retesting in employee selection: Lessons learned from multi-country unproctored internet-based tests
    Tzuyun Chin
    Anna Truscott-Smith
    Nikki Blacksmith
    8th conference of the International Test Commission (2012)
    Evaluating non-response bias in organizational surveys using response timing
    Stephanie K. Plowman
    Sangeeta Agrawal
    James K. Harter
    Fourth International Conference on Establishment Surveys (2012)
    A Multi-faceted look at context effects
    Sangeeta Agrawal
    James K. Harter
    Dan Witters
    77th annual meeting of the Psychometric Society (2012)
    Applicant attraction: Understanding preferences of high quality applicants
    Nikki Blacksmith
    Joseph H. Streur
    26th annual Society for Industrial and Organizational Psychology conference (2011)
    Classification accuracy of diagnostic methods: A simulation study
    Tzuyun Chin
    Kurt Geisinger
    2011 National Council of Measurement in Education annual conference
    One size doesn’t fit all: Differences in the attractiveness of organizational characteristics among job seekers
    Nikki Blacksmith
    54th annual conference of the Midwest Academy of Management (2011)
    Response styles and culture
    Janet A. Harkness
    Tzuyun Chin
    Ana Villar
    Survey Methods in Multicultural, Multinational, and Multiregional Contexts, ohn Wiley and Sons (2010)
    Development and validation of a selection tool that predicts engagement
    Nikki Blacksmith
    Joseph H. Streur
    Sangeeta Badal
    James K. Harter
    Paula Walker
    25th Annual Society for Industrial and Organizational Psychology conference (2010)
    Response latency as an indicator of optimizing in online questionnaires
    Dennison Bhola
    Don A. Dillman
    Katherine Chin
    Bullettin de Methodologie Sociologique, vol. 103 (2009), pp. 5-25
    Partial invariance in loadings and intercepts: Their interplay and implications for latent mean comparisons
    Deborah L. Bandalos
    2008 annual meeting of the American Educational Research Association
    Exposed items detection in personnel selection assessment: An exploration of new item statistic
    Abdullah Ferdous
    Tzuyun Chin
    2007 annual meeting of the National Council of Measurement in Education
    Detection of item degradation
    Abdullah Ferdous
    Tzuyun Chin
    2nd annual conference of the Society for Industrial and Organizational Psychology (2007)
    Evaluating guidelines for test adaptations: A methodological analysis of translation quality
    Stephen G. Sireci
    James K. Harter
    Eldin J. Ehrlich
    Journal of Cross-Cultural Psychology, vol. 37 (2006), pp. 557-567
    Evaluating computer automated scoring: Issues, methods, and an empirical illustration
    Chad W. Buckendahl
    Piotr J. Juszkiewicz
    Dennison S. Bhola
    Journal of Applied Testing Technology, vol. 7(3) (2005), pp. 1-43
    Problems of analyzing complex sample design data with common statistical software packages
    Tzuyun Chin
    2002 annual meeting of American Educational Research Association (2003)
    Evaluating the equivalence of an employee attitude survey across languages, cultures, and administration formats
    Stephen G. Sireci
    James K. Harter
    Dennison S. Bhola
    International Journal of Testing, vol. 3 (2003), pp. 129-150
    A review of strategies for validating computer-automated scoring
    Chad W. Buckendahl
    Piotr J. Juszkiewicz
    Dennison S. Bhola
    Applied Measurement in Education, vol. 15 (2002), pp. 391-412