Jump to Content
Emily Denton

Emily Denton

Research Areas

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract This paper reports on disability representation in images output from text-to-image (T2I) generative AI systems. Through eight focus groups with 25 people with disabilities, we found that models repeatedly presented reductive archetypes for different disabilities. Often these representations reflected broader societal stereotypes and biases, which our participants were concerned to see reproduced through T2I. Our participants discussed further challenges with using these models including the current reliance on prompt engineering to reach satisfactorily diverse results. Finally, they offered suggestions for how to improve disability representation with solutions like showing multiple, heterogeneous images for a single prompt and including the prompt with images generated. Our discussion reflects on tensions and tradeoffs we found among the diverse perspectives shared to inform future research on representation-oriented generative AI system evaluation metrics and development processes. View details
    Towards Globally Responsible Generative AI Benchmarks
    Rida Qadri
    ICLR Workshop : Practical ML for Developing Countries Workshop (2023)
    Preview abstract As generative AI globalizes, there is an opportunity to reorient our nascent development frameworks and evaluative practices towards a global context. This paper uses lessons from a community-centered study on the failure modes of text to Image models in the South Asian context, to give suggestions on how the AI/ML community can develop culturally and contextually situated benchmarks. We present three forms of mitigations for culturally situated- evaluations: 1) diversifying our diversity measures 2) participatory prompt dataset curation 2) multi-tiered evaluations structures for community engagement. Through these mitigations we present concrete methods to make our evaluation processes more holistic and human-centered while also engaging with demands of deployment at global scale. View details
    Preview abstract Large language models (LLMs) trained on real-world data can inadvertently reflect harmful societal biases, particularly toward historically marginalized communities. While previous work has primarily focused on harms related to age and race, emerging research has shown that biases toward disabled communities exist. This study extends prior work exploring the existence of harms by identifying categories of LLM-perpetuated harms toward the disability community. We conducted 19 focus groups, during which 56 participants with disabilities probed a dialog model about disability and discussed and annotated its responses. Participants rarely characterized model outputs as blatantly offensive or toxic. Instead, participants used nuanced language to detail how the dialog model mirrored subtle yet harmful stereotypes they encountered in their lives and dominant media, e.g., inspiration porn and able-bodied saviors. Participants often implicated training data as a cause for these stereotypes and recommended training the model on diverse identities from disability-positive resources. Our discussion further explores representative data strategies to mitigate harm related to different communities through annotation co-design with ML researchers and developers. View details
    AI’s Regimes of Representation: A Community-centered Study of Text-to-Image Models in South Asia
    Rida Qadri
    Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, Association for Computing Machinery, 506–517
    Preview abstract This paper presents a community-centered study of cultural limitations of text-to-image (T2I) models in the South Asian context. We theorize these failures using scholarship on dominant media regimes of representations and locate them within participants’ reporting of their existing social marginalizations. We thus show how generative AI can reproduce an outsiders gaze for viewing South Asian cultures, shaped by global and regional power inequities. By centering communities as experts and soliciting their perspectives on T2I limitations, our study adds rich nuance into existing evaluative frameworks and deepens our understanding of the culturally-specific ways AI technologies can fail in non-Western and Global South settings. We distill lessons for responsible development of T2I models, recommending concrete pathways forward that can allow for recognition of structural inequalities. View details
    Preview abstract Human annotated data plays a crucial role in machine learning (ML) research and development. However, the ethical considerations around the processes and decisions that go into dataset annotation have not received nearly enough attention. In this paper, we survey an array of literature that provides insights into ethical considerations around crowdsourced dataset annotation. We synthesize these insights, and lay out the challenges in this space along two layers: (1) who the annotator is, and how the annotators' lived experiences can impact their annotations, and (2) the relationship between the annotators and the crowdsourcing platforms, and what that relationship affords them. Finally, we introduce a novel framework, CrowdWorkSheets, for dataset developers to facilitate transparent documentation of key decisions points at various stages of the data annotation pipeline: task formulation, selection of annotators, platform and infrastructure choices, dataset analysis and evaluation, and dataset release and maintenance. View details
    Preview abstract In response to growing concerns of bias, discrimination, and unfairness perpetuated by algorithmic systems, the datasets used to train and evaluate machine learning models have come under increased scrutiny. Many of these examinations have focused on the contents of machine learning datasets, finding glaring underrepresentation of minoritized groups. In contrast, relatively little work has been done to examine the norms, values, and assumptions embedded in these datasets. In this work, we conceptualize machine learning datasets as a type of informational infrastructure, and motivate a genealogy as method in examining the histories and modes of constitution at play in their creation. We present a critical history of ImageNet as an exemplar, utilizing critical discourse analysis of major texts around ImageNet’s creation and impact. We find that assumptions around ImageNet and other large computer vision datasets more generally rely on three themes: the aggregation and accumulation of more data, the computational construction of meaning, and making certain types of data labor invisible. By tracing the discourses that surround this influential benchmark, we contribute to the ongoing development of the standards and norms around data development in machine learning and artificial intelligence research. View details
    Preview abstract Rising concern for the societal implications of artificial intelligence systems has inspired demands for greater transparency and accountability. However the datasets which empower machine learning are often used, shared and re-used with little visibility into the processes of deliberation which led to their creation. Which stakeholder groups had their perspectives included when the dataset was conceived? Which domain experts were consulted regarding how to model subgroups and other phenomena? How were questions of representational biases measured and addressed? Who labeled the data? In this paper, we introduce a rigorous framework for dataset development transparency which supports decision-making and accountability. The framework uses the cyclical, infrastructural and engineering nature of dataset development to draw on best practices from the software development lifecycle. Each stage of the data development lifecycle yields a set of documents that facilitate improved communication and decision-making, as well as drawing attention the value and necessity of careful data work. The proposed framework is intended to contribute to closing the accountability gap in artificial intelligence systems, by making visible the often overlooked work that goes into dataset creation. View details
    Data and its (dis)contents: A survey of dataset development and use in machine learning research
    Amandalynne Paullada
    Inioluwa Deborah Raji
    Emily Bender
    Alex Hanna
    Patterns (2021)
    Preview abstract Datasets form the basis for training, evaluating, and benchmarking machine learning models and have played a foundational role in the advancement of the field. Furthermore, the ways in which we collect, construct, and share these datasets inform the kinds of problems the field pursues and the methods explored in algorithm development. In this work, we survey recent issues pertaining to data in machine learning research, focusing primarily on work in computer vision and natural language processing. We summarize concerns relating to the design, collection, maintenance, distribution, and use of machine learning datasets as well as broader disciplinary norms and cultures that pervade the field. We advocate a turn in the culture toward more careful practices of development, maintenance, and distribution of datasets that are attentive to limitations and societal impact while respecting the intellectual property and privacy rights of data creators and data subjects. View details
    Art Sheets for Art Datasets
    Ramya Malur Srinivasan
    Jordan Jennifer Famularo
    Beth Coleman
    NeurIPS Dataset & Benchmark track (2021)
    Preview abstract As machine learning (ML) techniques are being employed to authenticate artworks and estimate their market value, computational tasks have expanded across a variety of creative domains and datasets drawn from the arts. With recent progress in generative modeling, ML techniques are also used for simulating artistic styles and for producing new content in various media such as music, visual arts, poetry, etc. While this progress has opened up new creative avenues, it has also paved the way for adverse downstream effects such as cultural appropriation (e.g., cultural misrepresentation, offense, and undervaluing) and amplification of gender and racial stereotypes, to name a few. Many such concerning issues stem from the training data in ways that diligent evaluation can uncover, prevent, and mitigate. In this paper, we provide a checklist of questions customized for use with art datasets, building on the questionnaire for datasets provided in Datasheets, by guiding assessment of developer motivation together with dataset provenance, composition, collection, pre-processing, cleaning, labeling, use (including data generation/synthesis), distribution, and maintenance. Case studies exemplify the value of our questionnaire. We hope our work aids ML scientists and developers by providing a framework for responsible design, development, and use of art datasets. View details
    Preview abstract Human annotations play a crucial role in machine learning (ML) research and development. However, the ethical considerations around the processes and decisions that go into building ML datasets, essentially shaping the research trajectories within our field, has not gotten nearly enough attention. In this paper, we survey an array of literature on human computation, with a focus on ethical considerations around crowdsourcing. We synthesize these insights, and lay out the challenges in this space along two layers: (1) who the annotator is and how the annotators' lived experiences can impact their annotations, and (2) the relationship between the annotators and the crowdsourcing platforms and what that relationship affords them. Finally, we put forth a concrete set of recommendations and considerations for dataset developers at various stages of the ML data pipeline: task formulation, selection of annotators, platform and infrastructure choices, dataset analysis and evaluation, and dataset documentation and release. View details
    Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research
    Bernard Koch
    Alex Hanna
    Jacob Foster
    NeurIPS Dataset & Benchmark track (2021)
    Preview abstract Datasets form the backbone of machine learning research. They are deeply integrated into work practices of machine learning researchers, operating as resources for training and testing machine learning models. Moreover, datasets serve a central role in the organization of machine learning as a scientific field. Benchmark datasets formalize tasks and coordinate scientists around shared research problems. Advancement on these benchmarks is considered a key signal for collective progress, and is thus also an important form of social capital to motivate and evaluate individual researchers. Given their central organizing role, datasets have also become a central object of critical inquiry in recent years. For example, dataset audits have revealed pervasive biases, studies of disciplinary norms of dataset development have relieved concerning practices relating to dataset development and dissemination, and a host of concerns relating to benchmarking practices have also emerged in recent years calling into question the validity of measurements. However, comparatively little attention has been paid to the dynamics of dataset use within and across machine learning subcommunities. In this work we dig into these dynamics, by studying how dataset usage patterns differ across different machine learning subcommunities and across time from 2014-2021. View details
    Do Datasets Have Politics? Disciplinary Values in Computer Vision Dataset Development
    Morgan Klaus Scheuerman
    Alex Hanna
    The 24th ACM Conference on Computer-Supported Cooperative Work and Social Computing (2021)
    Preview abstract Data is a crucial component of machine learning; a model is reliant on data to train, validate, and test it. With increased technical capabilities, machine learning research has boomed in both academic and industry settings---and one major focus has been on computer vision. Computer vision is a popular domain of machine learning increasingly pertinent to real world applications, from facial recognition in policing to object detection for autonomous vehicles. Given computer vision’s propensity to shape machine learning research practices and impact human life, we sought to understand disciplinary practices around dataset documentation---how data is collected, curated, annotated, and packaged into datasets for computer vision researchers and practitioners to use for model tuning and development. Specifically, we examined what dataset documentation communicates about the underlying values of vision data and the larger practices and goals of computer vision as a field. To conduct this study, we collected a large corpus of computer vision datasets, from which we sampled 114 databases across different vision tasks. We document a number of values around accepted data practices, what makes desirable data, and the treatment of humans in the dataset construction process. We discuss how computer vision database authors value efficiency at the expense of care; universality at the expense of contextuality; impartiality at the expense of positionality; and model work at the expense of data work. Many of the silenced values we identified sit in opposition with human-centered data practices, which we reference in our suggestions for better incorporating silenced values into the dataset curation process. View details
    Towards a Critical Race Methodology in Algorithmic Fairness
    Alex Hanna
    ACM Conference on Fairness, Accountability, and Transparency (ACM FAT*) (2020)
    Preview abstract We examine the way race and racial categories are adopted in algorithmic fairness frameworks. Current methodologies fail to adequately account for the socially constructed nature of race, instead adopting a conceptualization of race as a fixed attribute. Treating race as an attribute, rather than a structural, institutional, and relational phenomenon, can serve to minimize the structural aspects of algorithmic unfairness. In this work, we focus on the history of racial categories and turn to critical race theory and sociological work on race and ethnicity to ground conceptualizations of race for fairness research, drawing on lessons from public health, biomedical research, and social survey research. We argue that algorithmic fairness researchers need to take into account the multidimensionality of race, take seriously the processes of conceptualizing and operationalizing race, focus on social processes which produce racial inequality, and consider perspectives of those most affected by sociotechnical systems. View details
    Saving Face: Investigating the Ethical Concerns of Facial Recognition Auditing
    Inioluwa Deborah Raji
    Timnit Gebru
    Margaret Mitchell
    Joy Buolamwini
    Proceedings of the 3rd AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES), ACM (2020)
    Preview abstract Although essential to revealing biased performance, well intentioned attempts at algorithmic auditing can have effects that may harm the very populations these measures are meant to protect. This concern is even more salient while auditing biometric systems such as facial recognition, where the data is sensitive and the technology is often used in ethically questionable manners. We demonstrate a set of five ethical concerns in the particular case of auditing commercial facial processing technology, highlighting additional design considerations and ethical tensions the auditor needs to be aware of so as not exacerbate or complement the harms propagated by the audited system. We go further to provide tangible illustrations of these concerns, and conclude by reflecting on what these concerns mean for the role of the algorithmic audit and the fundamental product limitations they reveal. View details
    Bringing the People Back In: Contesting Benchmark Machine Learning Datasets
    Alex Hanna
    Razvan Amironesei
    Hilary Nicole
    Morgan Klaus Scheuerman
    Participatory Approaches to Machine Learning, ICML 2020 Workshop (2020)
    Preview abstract In response to algorithmic unfairness embedded in sociotechnical systems, significant attention has been focused on the contents of machine learning datasets which have revealed biases towards white, cisgender, male, and Western data subjects. In contrast, comparatively less attention has been paid to the histories, values, and norms embedded in such datasets. In this work, we outline a research program - a genealogy of machine learning data - for investigating how and why these datasets have been created, what and whose values influence the choices of data to collect, the contextual and contingent conditions of their creation. We describe the ways in which benchmark datasets in machine learning operate as infrastructure and pose four research questions for these datasets. This interrogation forces us to "bring the people back in" by aiding us in understanding the labor embedded in dataset construction, and thereby presenting new avenues of contestation for other researchers encountering the data. View details
    Characterising Bias in Compressed Models
    Sara Hooker
    Nyalleng Moorosi
    Gregory Clark
    Samy Bengio
    Preview abstract The popularity and widespread use of pruning and quantization is driven by the severe resource constraints of deploying deep neural networks to environments with strict latency, memory and energy requirements. These techniques achieve high levels of compression with negligible impact on top-line metrics (top-1 and top-5 accuracy). However, overall accuracy hides disproportionately high errors on a small subset of examples; we call this subset Compression Identified Exemplars (CIE). We further establish that for CIE examples, compression amplifies existing algorithmic bias. Pruning disproportionately impacts performance on underrepresented features, which often coincides with considerations of fairness. Given that CIE is a relatively small subset but a great contributor of error in the model, we propose its use as a human-in-the-loop auditing tool to surface a tractable subset of the dataset for further inspection or annotation by a domain expert. We provide qualitative and quantitative support that CIE surfaces the most challenging examples in the data distribution for human-in-the-loop auditing. View details
    Preview abstract Building equitable and inclusive technologies demands paying attention to how social attitudes towards persons with disabilities are represented within technology. Representations perpetuated by NLP models often inadvertently encode undesirable social biases from the data on which they are trained. In this paper, first we present evidence of such undesirable biases towards mentions of disability in two different NLP models: toxicity prediction and sentiment analysis. Next, we demonstrate that neural embeddings that are critical first steps in most NLP pipelines also contain undesirable biases towards mentions of disabilities. We then expose the topical biases in the social discourse about some disabilities which may explain such biases in the models; for instance, terms related to gun violence, homelessness, and drug addiction are over-represented in discussions about mental illness. View details
    Diversity and Inclusion Metrics for Subset Selection
    Margaret Mitchell
    Dylan Baker
    Nyalleng Moorosi
    Alex Hanna
    Timnit Gebru
    Jamie Morgenstern
    Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (AIES), ACM (2020)
    Preview abstract The concept of fairness has recently been applied in machine learning settings to describe a wide range of constraints and objectives. When applied to ranking, recommendation, or subset selection problems for an individual, it becomes less clear that fairness goals are more applicable than goals that prioritize diverse outputs and instances that represent the individual's goals well. In this work, we discuss the relevance of the concept of fairness to the concepts of diversity and inclusion, and introduce metrics that quantify the diversity and inclusion of an instance or set. Diversity and inclusion metrics can be used in tandem, including additional fairness constraints, or may be used separately, and we detail how the different metrics interact. Results from human subject experiments demonstrate that the proposed criteria for diversity and inclusion are consistent with social notions of these two concepts, and human judgments on the diversity and inclusion of example instances are correlated with the defined metrics. View details
    Detecting Bias with Generative Counterfactual Face Attribute Augmentation
    Margaret Mitchell
    Timnit Gebru
    Fairness, Accountability, Transparency and Ethics in Computer Vision Workshop (in conjunction with CVPR) (2019)
    Preview abstract We introduce a simple framework for identifying biases of a smiling attribute classifier. Our method poses counterfactual questions of the form: how would the prediction change if this face characteristic had been different? We leverage recent advances in generative adversarial networks to build a realistic generative model of faces that affords controlled manipulation of specific facial characteristics. Empirically, we identify several different factors of variation (that we believe should be in-dependent of a smiling) that affect the predictions of a smiling classifier trained on CelebA. View details
    Preview abstract Persons with disabilities face many barriers to participation in society, and the rapid advancement of technology creates ever more. Achieving fair opportunity and justice for people with disabilities demands paying attention not just to accessibility, but also to the attitudes towards, and representations of, disability that are implicit in machine learning (ML) models that are pervasive in how one engages with the society. However such models often inadvertently learn to perpetuate undesirable social biases from the data on which they are trained. This can result, for example, in models for classifying text producing very different predictions for {\em I stand by a person with mental illness}, and {\em I stand by a tall person}. We present evidence of such social biases in existing ML models, along with an analysis of biases in a dataset used for model development. View details
    No Results Found