
Courtney Heldreth
Research Areas
Authored Publications
Sort By
Preview abstract
Indigenous languages are historically under-served by Natural Language Processing (NLP) technologies, but this is changing for some languages with the recent scaling of large multilingual models and an increased focus by the NLP community on endangered languages. This position paper explores ethical considerations in building NLP technologies for Indigenous languages, based on the premise that such projects should primarily serve Indigenous communities. We report on interviews with 17 researchers working in or with Aboriginal and/or Torres Strait Islander communities on language technology projects in Australia. Drawing on insights from the interviews, we recommend practices for NLP researchers to increase attention to the process of engagements with Indigenous communities, rather than focusing only on decontextualised artefacts.
View details
Which Skin Tone Measures are the Most Inclusive? An Investigation of Skin Tone Measures for Machine Learning
Ellis Monk
X Eyee
ACM Journal of Responsible Computing (2024) (to appear)
Preview abstract
Skin tone plays a critical role in artificial intelligence (AI), especially in biometrics, human sensing, computer vision, and fairness evaluations. However, many algorithms have exhibited unfair bias against people with darker skin tones, leading to misclassifications, poor user experiences, and exclusions in daily life. One reason this occurs is a poor understanding of how well the scales we use to measure and account for skin tone in AI actually represent the variation of skin tones in people affected by these systems. Although the Fitzpatrick scale has become the industry standard for skin tone evaluation in machine learning, its documented bias towards lighter skin tones suggests that other skin tone measures are worth investigating. To address this, we conducted a survey with 2,214 people in the United States to compare three skin tone scales: The Fitzpatrick 6-point scale, Rihanna’s Fenty™ Beauty 40-point skin tone palette, and a newly developed Monk 10-point scale from the social sciences. We find the Fitzpatrick scale is perceived to be less inclusive than the Fenty and Monk skin tone scales, and this was especially true for people from historically marginalized communities (i.e., people with darker skin tones, BIPOCs, and women). We also find no statistically meaningful differences in perceived representation across the Monk skin tone scale and the Fenty Beauty palette. Through this rigorous testing and validation of skin tone measurement, we discuss the ways in which our findings can advance the understanding of skin tone in both the social science and machine learning communities.
View details
Believing Anthropomorphism: Examining the Role of Anthropomorphic Cues on User Trust in Large Language Models
Michelle Cohn
Femi Olanubi
Zion Mengesha
Daniel Padgett
CM (Association of Computing Machinery) CHI conference on Human Factors in Computing Systems 2024 (2024)
Preview abstract
People now regularly interface with Large Language Models (LLMs) via speech and text (e.g., Bard) interfaces. However, little is known about the relationship between how users anthropomorphize an LLM system (i.e., ascribe human-like characteristics to a system) and how they trust the information the system provides. Participants (n=2,165; ranging in age from 18-90 from the United States) completed an online experiment, where they interacted with a pseudo-LLM that varied in modality (text only, speech + text) and grammatical person (“I” vs. “the system”) in its responses. Results showed that the “speech + text” condition led to higher anthropomorphism of the system overall, as well as higher ratings of accuracy of the information the system provides. Additionally, the first-person pronoun (“I”) led to higher information accuracy and reduced risk ratings, but only in one context. We discuss these findings for their implications for the design of responsible, human–generative AI experiences.
View details
Preview abstract
This paper examines the adaptations African American English speakers make when imagining talking to a voice assistant, compared to a close friend/family member and to a stranger. Results show that speakers slowed their rate and produced less pitch variation in voice-assistant-“directed speech” (DS), relative to human-DS. These adjustments were not mediated by how often participants reported experiencing errors with automatic speech recognition. Overall, this paper addresses a limitation in the types of language varieties explored when examining technology-DS registers and contributes to our understanding of the dynamics of human-computer interaction.
View details
Consensus and Subjectivity of Skin Tone Annotation for ML Fairness
Ellis Monk
Femi Olanubi
Auriel Wright
(2023) (to appear)
Preview abstract
Understanding different human attributes and how they affect model behavior may become a standard need for all model creation and usage, from traditional computer vision tasks to the newest multimodal generative AI systems. In computer vision specifically, we have relied on datasets augmented with perceived attribute signals (e.g., gender presentation, skin tone, and age) and benchmarks enabled by these datasets. Typically labels for these tasks come from human annotators. However, annotating attribute signals, especially skin tone, is a difficult and subjective task. Perceived skin tone is affected by technical factors, like lighting conditions, and social factors that shape an annotator's lived experience. This paper examines the subjectivity of skin tone annotation through a series of annotation experiments using the Monk Skin Tone (MST) scale, a small pool of professional photographers, and a much larger pool of trained crowdsourced annotators. Along with this study we release the Monk Skin Tone Examples (MST-E) dataset, containing 1515 images and 31 videos spread across the full MST scale. MST-E is designed to help train human annotators to annotate MST effectively. Our study shows that annotators can reliably annotate skin tone in a way that aligns with an expert in the MST scale, even under challenging environmental conditions. We also find evidence that annotators from different geographic regions rely on different mental models of MST categories resulting in annotations that systematically vary across regions. Given this, we advise practitioners to use a diverse set of annotators and a higher replication count for each image when annotating skin tone for fairness research.
View details
Preview abstract
There is increasing concern that how researchers currently define and measure fairness is inadequate. Recent calls push to move beyond traditional concepts of fairness and consider related constructs through qualitative and community-based approaches, particularly for underrepresented communities most at-risk for AI harm. One in context, previous research has identified that voice technologies are unfair due to racial and age disparities. This paper uses voice technologies as a case study to unpack how Black older adults value and envision fair and equitable AI systems. We conducted design workshops and interviews with 16 Black older adults, exploring how participants envisioned voice technologies that better understand cultural context and mitigate cultural dissonance. Our findings identify tensions between what it means to have fair, inclusive, and representative voice technologies. This research raises questions about how and whether researchers can model cultural representation with large language models.
View details
A Systematic Review and Thematic Analysis of Community-Collaborative Approaches to Computing Research
Ned Cooper
Tiffanie Horne
Gillian Hayes
Jess Scon Holbrook
Lauren Wilcox
ACM Conference on Human Factors in Computing Systems (ACM CHI) 2022 (2022)
Preview abstract
HCI researchers have been gradually shifting attention from individual users to communities when engaging in research, design, and system development. However, our field has yet to establish a cohesive, systematic understanding of the challenges, benefits, and commitments of community-collaborative approaches to research. We conducted a systematic review and thematic analysis of 47 computing research papers discussing participatory research with communities for the development of technological artifacts and systems, published over the last two decades. From this review, we identified seven themes associated with the evolution of a project: from establishing community partnerships to sustaining results. Our findings suggest several tensions characterize these projects, many of which relate to the power and position of researchers, and the computing research environment, relative to community partners. We discuss the implications of our findings and offer methodological proposals to guide HCI, and computing research more broadly, towards practices that center a community.
View details
Preview abstract
Artificial intelligence (AI) offers opportunities to solve complex problems facing smallholder farmers in the Global South. However, there is currently a dearth of research and resources available to organizations and policy-makers for building farmer-centered AI systems. As technologists, we believe it is our responsibility to draw from and contribute to research on farmers’ needs, practices, value systems, social worlds, and daily agricultural ecosystem realities. Drawing from our own fieldwork experience and scholarship, we propose concrete future directions for building AI solutions and tools that are meaningful to farmers and will significantly improve their lives. We also discuss tensions that may arise when incorporating AI into farming ecosystems. We hope that a closer look into these research areas will serve as a guide for technologists looking to leverage AI to help smallholder farmers in the Global South.
View details
“Mixture of amazement at the potential of this technology and concern about possible pitfalls”: Public sentiment towards AI in 15 countries
Patrick Gage Kelley
Christopher Moessner
Aaron M Sedley
Allison Woodruff
Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 44 (2021), pp. 28-46
Preview abstract
Public opinion plays an important role in the development of technology, influencing product adoption, commercial development, research funding, career choices, and regulation. In this paper we present results of an in-depth survey of public opinion of artificial intelligence (AI) conducted with over 17,000 respondents spanning fifteen countries and six continents. Our analysis of open-ended responses regarding sentiment towards AI revealed four key themes (exciting, useful, worrying, and futuristic) which appear to varying degrees in different countries. These sentiments, and their relative prevalence, may inform how the public influences the development of AI.
View details
"I don't think these devices are very culturally sensitive." - The impact of errors on African Americans in Automated Speech Recognition
Zion Mengesha
Juliana Sublewski
Elyse Tuennerman
Frontiers in Artificial Intelligence, 26 (2021)
Preview abstract
Automated speech recognition (ASR) converts language into text and is used across a variety of applications to assist us in everyday life, from powering virtual assistants, natural language conversations, to enabling dictation services. While recent work suggests that there are racial disparities in the performance of ASR systems for speakers of African American Vernacular English, little is known about the psychological and experiential effects of these failures paper provides a detailed examination of the behavioral and psychological consequences of ASR voice errors and the difficulty African American users have with getting their intents recognized. The results demonstrate that ASR failures have a negative, detrimental impact on African American users. Specifically, African Americans feel othered when using technology powered by ASR—errors surface thoughts about identity, namely about race and geographic location—leaving them feeling that the technology was not made for them. As a result, African Americans accommodate their speech to have better success with the technology. We incorporate the insights and lessons learned from sociolinguistics in our suggestions for linguistically responsive ways to build more inclusive voice systems that consider African American users’ needs, attitudes, and speech patterns. Our findings suggest that the use of a diary study can enable researchers to best understand the experiences and needs of communities who are often misunderstood by ASR. We argue this methodological framework could enable researchers who are concerned with fairness in AI to better capture the needs of all speakers who are traditionally misheard by voice-activated, artificially intelligent (voice-AI) digital systems.
View details