Remi Denton

Remi Denton

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Recent studies have highlighted the issue of varying degrees of stereotypical depictions for different identity group. However, these existing approaches have several key limitations, including a noticeable lack of coverage of identity groups in their evaluation, and the range of their associated stereotypes. Additionally, these studies often lack a critical distinction between inherently visual stereotypes, such as `brown' or `sombrero', and culturally influenced stereotypes like `kind' or `intelligent'. In this work, we address these limitations by grounding our evaluation of regional, geo-cultural stereotypes in the generated images from Text-to-Image models by leveraging existing textual resources. We employ existing stereotype benchmarks to evaluate stereotypes and focus exclusively on the identification of visual stereotypes within the generated images spanning 135 identity groups. We also compute the offensiveness across identity groups, and check the feasibility of identifying stereotypes automatically. Further, through a detailed case study and quantitative analysis, we reveal how the default representations of all identity groups have a more stereotypical appearance, and for historically marginalized groups, how the images across different attributes are visually more similar than other groups, even when explicitly prompted otherwise. View details
    Preview abstract This paper reports on disability representation in images output from text-to-image (T2I) generative AI systems. Through eight focus groups with 25 people with disabilities, we found that models repeatedly presented reductive archetypes for different disabilities. Often these representations reflected broader societal stereotypes and biases, which our participants were concerned to see reproduced through T2I. Our participants discussed further challenges with using these models including the current reliance on prompt engineering to reach satisfactorily diverse results. Finally, they offered suggestions for how to improve disability representation with solutions like showing multiple, heterogeneous images for a single prompt and including the prompt with images generated. Our discussion reflects on tensions and tradeoffs we found among the diverse perspectives shared to inform future research on representation-oriented generative AI system evaluation metrics and development processes. View details
    AI’s Regimes of Representation: A Community-centered Study of Text-to-Image Models in South Asia
    Rida Qadri
    Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, Association for Computing Machinery, 506–517
    Preview abstract This paper presents a community-centered study of cultural limitations of text-to-image (T2I) models in the South Asian context. We theorize these failures using scholarship on dominant media regimes of representations and locate them within participants’ reporting of their existing social marginalizations. We thus show how generative AI can reproduce an outsiders gaze for viewing South Asian cultures, shaped by global and regional power inequities. By centering communities as experts and soliciting their perspectives on T2I limitations, our study adds rich nuance into existing evaluative frameworks and deepens our understanding of the culturally-specific ways AI technologies can fail in non-Western and Global South settings. We distill lessons for responsible development of T2I models, recommending concrete pathways forward that can allow for recognition of structural inequalities. View details
    Preview abstract Large language models (LLMs) trained on real-world data can inadvertently reflect harmful societal biases, particularly toward historically marginalized communities. While previous work has primarily focused on harms related to age and race, emerging research has shown that biases toward disabled communities exist. This study extends prior work exploring the existence of harms by identifying categories of LLM-perpetuated harms toward the disability community. We conducted 19 focus groups, during which 56 participants with disabilities probed a dialog model about disability and discussed and annotated its responses. Participants rarely characterized model outputs as blatantly offensive or toxic. Instead, participants used nuanced language to detail how the dialog model mirrored subtle yet harmful stereotypes they encountered in their lives and dominant media, e.g., inspiration porn and able-bodied saviors. Participants often implicated training data as a cause for these stereotypes and recommended training the model on diverse identities from disability-positive resources. Our discussion further explores representative data strategies to mitigate harm related to different communities through annotation co-design with ML researchers and developers. View details
    Towards Globally Responsible Generative AI Benchmarks
    Rida Qadri
    ICLR Workshop : Practical ML for Developing Countries Workshop (2023)
    Preview abstract As generative AI globalizes, there is an opportunity to reorient our nascent development frameworks and evaluative practices towards a global context. This paper uses lessons from a community-centered study on the failure modes of text to Image models in the South Asian context, to give suggestions on how the AI/ML community can develop culturally and contextually situated benchmarks. We present three forms of mitigations for culturally situated- evaluations: 1) diversifying our diversity measures 2) participatory prompt dataset curation 2) multi-tiered evaluations structures for community engagement. Through these mitigations we present concrete methods to make our evaluation processes more holistic and human-centered while also engaging with demands of deployment at global scale. View details
    Preview abstract Human annotated data plays a crucial role in machine learning (ML) research and development. However, the ethical considerations around the processes and decisions that go into dataset annotation have not received nearly enough attention. In this paper, we survey an array of literature that provides insights into ethical considerations around crowdsourced dataset annotation. We synthesize these insights, and lay out the challenges in this space along two layers: (1) who the annotator is, and how the annotators' lived experiences can impact their annotations, and (2) the relationship between the annotators and the crowdsourcing platforms, and what that relationship affords them. Finally, we introduce a novel framework, CrowdWorkSheets, for dataset developers to facilitate transparent documentation of key decisions points at various stages of the data annotation pipeline: task formulation, selection of annotators, platform and infrastructure choices, dataset analysis and evaluation, and dataset release and maintenance. View details
    Preview abstract Rising concern for the societal implications of artificial intelligence systems has inspired demands for greater transparency and accountability. However the datasets which empower machine learning are often used, shared and re-used with little visibility into the processes of deliberation which led to their creation. Which stakeholder groups had their perspectives included when the dataset was conceived? Which domain experts were consulted regarding how to model subgroups and other phenomena? How were questions of representational biases measured and addressed? Who labeled the data? In this paper, we introduce a rigorous framework for dataset development transparency which supports decision-making and accountability. The framework uses the cyclical, infrastructural and engineering nature of dataset development to draw on best practices from the software development lifecycle. Each stage of the data development lifecycle yields a set of documents that facilitate improved communication and decision-making, as well as drawing attention the value and necessity of careful data work. The proposed framework is intended to contribute to closing the accountability gap in artificial intelligence systems, by making visible the often overlooked work that goes into dataset creation. View details
    Do Datasets Have Politics? Disciplinary Values in Computer Vision Dataset Development
    Morgan Klaus Scheuerman
    Alex Hanna
    The 24th ACM Conference on Computer-Supported Cooperative Work and Social Computing (2021)
    Preview abstract Data is a crucial component of machine learning; a model is reliant on data to train, validate, and test it. With increased technical capabilities, machine learning research has boomed in both academic and industry settings---and one major focus has been on computer vision. Computer vision is a popular domain of machine learning increasingly pertinent to real world applications, from facial recognition in policing to object detection for autonomous vehicles. Given computer vision’s propensity to shape machine learning research practices and impact human life, we sought to understand disciplinary practices around dataset documentation---how data is collected, curated, annotated, and packaged into datasets for computer vision researchers and practitioners to use for model tuning and development. Specifically, we examined what dataset documentation communicates about the underlying values of vision data and the larger practices and goals of computer vision as a field. To conduct this study, we collected a large corpus of computer vision datasets, from which we sampled 114 databases across different vision tasks. We document a number of values around accepted data practices, what makes desirable data, and the treatment of humans in the dataset construction process. We discuss how computer vision database authors value efficiency at the expense of care; universality at the expense of contextuality; impartiality at the expense of positionality; and model work at the expense of data work. Many of the silenced values we identified sit in opposition with human-centered data practices, which we reference in our suggestions for better incorporating silenced values into the dataset curation process. View details
    Preview abstract Human annotations play a crucial role in machine learning (ML) research and development. However, the ethical considerations around the processes and decisions that go into building ML datasets, essentially shaping the research trajectories within our field, has not gotten nearly enough attention. In this paper, we survey an array of literature on human computation, with a focus on ethical considerations around crowdsourcing. We synthesize these insights, and lay out the challenges in this space along two layers: (1) who the annotator is and how the annotators' lived experiences can impact their annotations, and (2) the relationship between the annotators and the crowdsourcing platforms and what that relationship affords them. Finally, we put forth a concrete set of recommendations and considerations for dataset developers at various stages of the ML data pipeline: task formulation, selection of annotators, platform and infrastructure choices, dataset analysis and evaluation, and dataset documentation and release. View details
    Data and its (dis)contents: A survey of dataset development and use in machine learning research
    Amandalynne Paullada
    Inioluwa Deborah Raji
    Emily Bender
    Alex Hanna
    Patterns (2021)
    Preview abstract Datasets form the basis for training, evaluating, and benchmarking machine learning models and have played a foundational role in the advancement of the field. Furthermore, the ways in which we collect, construct, and share these datasets inform the kinds of problems the field pursues and the methods explored in algorithm development. In this work, we survey recent issues pertaining to data in machine learning research, focusing primarily on work in computer vision and natural language processing. We summarize concerns relating to the design, collection, maintenance, distribution, and use of machine learning datasets as well as broader disciplinary norms and cultures that pervade the field. We advocate a turn in the culture toward more careful practices of development, maintenance, and distribution of datasets that are attentive to limitations and societal impact while respecting the intellectual property and privacy rights of data creators and data subjects. View details