Awa Dieng

I am a Research Associate on the Google Brain team working on Machine Learning, Causality, and Algorithmic Fairness. Personal website:

Research Areas

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Large language models (LLMs) hold immense promise to serve complex health information needs but also have the potential to introduce harm and exacerbate health disparities. Reliably evaluating equity-related model failures is a critical step toward developing systems that promote health equity. In this work, we present resources and methodologies for surfacing biases with potential to precipitate equity-related harms in long-form, LLM-generated answers to medical questions and then conduct an empirical case study with Med-PaLM 2, resulting in the largest human evaluation study in this area to date. Our contributions include a multifactorial framework for human assessment of LLM-generated answers for biases, and EquityMedQA, a collection of seven newly-released datasets comprising both manually-curated and LLM-generated questions enriched for adversarial queries. Both our human assessment framework and dataset design process are grounded in an iterative participatory approach and review of possible biases in Med-PaLM 2 answers to adversarial queries. Through our empirical study, we find that the use of a collection of datasets curated through a variety of methodologies, coupled with a thorough evaluation protocol that leverages multiple assessment rubric designs and diverse rater groups, surfaces biases that may be missed via narrower evaluation approaches. Our experience underscores the importance of using diverse assessment methodologies and involving raters of varying backgrounds and expertise. We emphasize that while our framework can identify specific forms of bias, it is not sufficient to holistically assess whether the deployment of an AI system promotes equitable health outcomes. We hope the broader community leverages and builds on these tools and methods towards realizing a shared goal of LLMs that promote accessible and equitable healthcare for all. View details
    Diagnosing failures of fairness transfer across distribution shift in real-world medical settings
    Sanmi Koyejo
    Eva Schnider
    Krista Opsahl-Ong
    Alex Brown
    Christina Chen
    Yuan Liu
    Silvia Chiappa
    Alexander Nicholas D'Amour
    Proceedings of Neural Information Processing Systems 2022(2022)
    Preview abstract Diagnosing and mitigating changes in model fairness under distribution shift is an important component of the safe deployment of machine learning in healthcare settings. Importantly, the success of any mitigation strategy strongly depends on the structure of the shift. Despite this, there has been little discussion of how to empirically assess the structure of a distribution shift that one is encountering in practice. In this work, we adopt a causal framing to motivate conditional independence tests as a key tool for characterizing distribution shifts. Using our approach in two medical applications, we show that this knowledge can help diagnose failures of fairness transfer, including cases where real-world shifts are more complex than is often assumed in the literature. Based on these results, we discuss potential remedies at each step of the machine learning pipeline. View details
    Causal inference methods for combining randomized trials and observational studies: a review
    Benedicte Colnet
    Imke Mayer
    Guanhua Chen
    Ruohong Li
    Gaël Varoquaux
    Jean-Philippe Vert
    Julie Josse
    Shu Yang
    Preview abstract With increasing data availability, treatment causal effects can be evaluated across different dataset, both randomized trials and observational studies. Randomized trials isolate the effect of the treatment from that of unwanted (confounding) co-occuring effects. But they may be applied to limited populations, and thus lack external validity. On the opposite large observational samples are often more representative of the target population but can conflate confounding effects with the treatment of interest. In this paper, we review the growing literature on methods for causal inference on combined randomized trial and observational studies, striving for the best of both worlds. We first discuss identification and estimation methods that improve generalizability of randomized controlled trials (RCTs) using the representativeness of observational data. Classical estimators include weighting, difference between conditional outcome models, and double robust estimators. We then discuss methods that combining RCTs and observational data to improve the (conditional) average treatment effect estimation, handling possible unmeasured confounding in the observational data. We also connect and contrast works developed in both the potential outcomes framework and the structural causal models framework. Finally, we compare the main methods using a simulation study and real world data to analyse the effect of tranexamic acid on the mortality rate in major trauma patients. Code to implement many of the methods is provided. View details
    No Results Found