Jump to Content

Responsible AI

Our research in Responsible AI aims to shape the field of artificial intelligence and machine learning in ways that foreground the human experiences and impacts of these technologies. We examine and shape emerging AI models, systems, and datasets used in research, development, and practice. This research uncovers foundational insights and devises methodologies that define the state-of-the-art across the field. We advance equity, fairness, transparency, robustness, interpretability, and inclusivity as key elements of AI systems. For example, recent research evaluates the generalizability of the fairness properties of medical AI algorithms and discusses the cultural properties of fair AI systems globally. We strive to ensure that the promise of AI is realized beneficially for all individuals and communities, prioritizing social and contextual implications.

Recent Publications

Generative AI in Creative Practice: ML-Artist Folk Theories of T2I Use, Harm, and Harm-Reduction
Shalaleh Rismani
Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24), Association for Computing Machinery (2024), pp. 1-17 (to appear)
Preview abstract Understanding how communities experience algorithms is necessary to mitigate potential harmful impacts. This paper presents folk theories of text-to-image (T2I) models to enrich understanding of how artist communities experience creative machine learning (ML) systems. This research draws on data collected from a workshop with 15 artists from 10 countries who incorporate T2I models in their creative practice. Through reflexive thematic analysis of workshop data, we highlight theorization of T2I use, harm, and harm-reduction. Folk theories of use envision T2I models as an artistic medium, a mundane tool, and locate true creativity as rising above model affordances. Theories of harm articulate T2I models as harmed by engineering efforts to eliminate glitches and product policy efforts to limit functionality. Theories of harm-reduction orient towards protecting T2I models for creative practice through transparency and distributed governance. We examine how these theories relate, and conclude by discussing how folk theorization informs responsible AI efforts. View details
Preview abstract Language models still struggle on moral reasoning, despite their impressive performance in many other tasks. In particular, the Moral Scenarios task in MMLU (Multi-task Language Understanding) is among the worst performing tasks for many language models, including GPT-3. In this work, we propose a new prompting framework, Thought Experiments, to teach language models to do better moral reasoning using counterfactuals. Experiment results show that our framework elicits counterfactual questions and answers from the model, which in turn helps improve the accuracy on Moral Scenarios task by 9-16% compared to other zero-shot baselines. Interestingly, unlike math reasoning tasks, zero-shot Chain-of-Thought (CoT) reasoning doesn't work out of the box, and even reduces accuracy by around 4% compared to direct zero-shot. We further observed that with minimal human supervision in the form of 5 few-shot examples, the accuracy of the task can be improved to as much as 80%. View details
Preview abstract Adversarial testing of large language models (LLMs) is crucial for their safe and responsible deployment. We introduce a novel approach for automated generation of adversarial evaluation datasets to test the safety of LLM generations on new downstream applications. We call it AI-assisted Red-Teaming (AART) - an automated alternative to current manual red-teaming efforts. AART offers a data generation and augmentation pipeline of reusable and customizable recipes that reduce human effort significantly and enable integration of adversarial testing earlier in new product development. AART generates evaluation datasets with high diversity of content characteristics critical for effective adversarial testing (e.g. sensitive and harmful concepts, specific to a wide range of cultural and geographic regions and application scenarios). The data generation is steered by AI-assisted recipes to define, scope and prioritize diversity within the application context. This feeds into a structured LLM-generation process that scales up evaluation priorities. Compared to some state-of-the-art tools, AART shows promising results in terms of concept coverage and data quality. View details
Preview abstract As new forms of data capture emerge to power new AI applications, questions abound about the ethical implications of these data collection practices. In this paper, we present clinicians' perspectives on the prospective benefits and harms of voice data collection during health consultations. Such data collection is being proposed as a means to power models to assist clinicians with medical data entry, administrative tasks, and consultation analysis. Yet, clinicians' attitudes and concerns are largely absent from the AI narratives surrounding these use cases, and the academic literature investigating them. Our qualitative interview study used the concept of an informed consent process as a type of design fiction, to support elicitation of clinicians' perspectives on voice data collection and use associated with a fictional, near-term AI assistant. Through reflexive thematic analysis of in-depth sessions with physicians, we distilled eight classes of potential risks that clinicians are concerned about, including workflow disruptions, self-censorship, and errors that could impact patient eligibility for services. We conclude with an in-depth discussion of these prospective risks, reflect on the use of the speculative processes that illuminated them, and reconsider evaluation criteria for AI-assisted clinical documentation technologies in light of our findings. View details
AI’s Regimes of Representation: A Community-centered Study of Text-to-Image Models in South Asia
Rida Qadri
Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, Association for Computing Machinery, 506–517
Preview abstract This paper presents a community-centered study of cultural limitations of text-to-image (T2I) models in the South Asian context. We theorize these failures using scholarship on dominant media regimes of representations and locate them within participants’ reporting of their existing social marginalizations. We thus show how generative AI can reproduce an outsiders gaze for viewing South Asian cultures, shaped by global and regional power inequities. By centering communities as experts and soliciting their perspectives on T2I limitations, our study adds rich nuance into existing evaluative frameworks and deepens our understanding of the culturally-specific ways AI technologies can fail in non-Western and Global South settings. We distill lessons for responsible development of T2I models, recommending concrete pathways forward that can allow for recognition of structural inequalities. View details
Preview abstract This paper demonstrates how the limitations of pre-trained models and open evaluation datasets factor into assessing the performance of binary semantic similarity classification tasks. As (1) end-user-facing documentation around the curation of these datasets and pre-trained model training regimes is often not easily accessible and (2) given the lower friction and higher demand to quickly deploy such systems in real-world contexts, our study reinforces prior work showing performance disparities across datasets, embedding techniques and distance metrics, while highlighting the importance of understanding how data is collected, curated and analyzed in semantic similarity classification. View details

Some of our teams