Flavien Prost

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Towards A Scalable Solution for Improving Multi-Group Fairness in Compositional Classification
    Tina Tian
    Ben Packer
    Meghana Deodhar
    Alex Beutel
    The Second Workshop on Spurious Correlations, Invariance and Stability @ ICML 2023(2023)
    Preview abstract Despite the rich literature on machine learning fairness, relatively little attention has been paid to remediating complex systems, where the final prediction is the combination of multiple classifiers and where multiple groups are present. In this paper, we first show that natural baseline approaches for improving equal opportunity fairness scale linearly with the product of the number of remediated groups and the number of remediated prediction labels, rendering them impractical. We then introduce two simple techniques, called task-overconditioning and group-interleaving, to achieve a constant scaling in this multi-group multi-label setup. Our experimental results in academic and real-world environments demonstrate the effectiveness of our proposal at mitigation within this environment. View details
    Measuring Model Fairness under Noisy Covariates: A Theoretical Perspective
    Aditee Ajit Kumthekar
    Alex Beutel
    Li Wei
    Nick Blumm
    Pranjal Awasthi
    Trevor Potter
    AIES(2021)
    Preview abstract In this work we study the problem of measuring the fairness of a machine learning model under noisy information. In many applications, evaluating a model according to a well-specified metric such as the FPR requires access to variables that cannot be jointly observed in a given practical setting. A standard workaround is to then use proxies for one or more of these variables. These proxies are either obtained using domain expertise or by training another machine learning model. Prior works have demonstrated the dangers of using such an approach, and strong independence assumptions are needed to provide guarantees on the accuracy of the noisy estimates via proxies. In contrast, in this work we present a general theoretical framework that aims to characterize weaker conditions under which accurate model auditing is possible via the above approach. Furthermore, our theory identifies potential sources of errors and decouples them into two interpretable parts Epsilon_c and Epsilon_g. The first part depends on natural properties of the proxy such as precision and recall, whereas the second part captures correlations between different variables of interest. We show that in many scenarios the error in the estimates is dominated by the Epsilon_c via a linear dependence, whereas the dependence on the correlations only constitutes a lower order term. As a result we expand the understanding of scenarios where model auditing via proxies can be an effective approach. Finally, we compare via simulations the theoretical upper-bounds to the distribution of simulated estimation errors and show that both theoretical guarantees and empirical results significantly improve as we progressively enforce structure along the conditions highlighted by the theory. View details
    Preview abstract As multi-task models gain popularity in a wider range of machine learning applications, it is becoming increasingly important for practitioners to understand the fairness implications associated with those models. Most existing fairness literature focuses on learning a single task more fairly, while how ML fairness interacts with multiple tasks in the joint learning setting is largely under-explored. In this paper, we are concerned with how group fairness (e.g., equal opportunity, equalized odds) as an ML fairness concept plays out in the multi-task scenario. In multi-task learning, several tasks are learned jointly to exploit task correlations for a more efficient inductive transfer. This presents a multi-dimensional Pareto frontier on (1) the trade-off between group fairness and accuracy with respect to each task, as well as (2) the trade-offs across multiple tasks. We aim to provide a deeper understanding on how group fairness interacts with accuracy in multi-task learning, and we show that traditional approaches that mainly focus on optimizing the Pareto frontier of multi-task accuracy might not perform well on fairness goals. We propose a new set of metrics to better capture the multi-dimensional Pareto frontier of fairness-accuracy trade-offs uniquely presented in a multi-task learning setting. We further propose a Multi-Task-Aware Fairness (MTA-F) approach to improve fairness in multi-task learning. Experiments on several real-world datasets demonstrate the effectiveness of our proposed approach. View details
    Preview abstract Most literature in fairness has focused on improving fairness with respect to one single model or one single objective. However, real-world machine learning systems are usually composed of many different components. Unfortunately, recent research has shown that even if each component is "fair", the overall system can still be "unfair". In this paper, we focus on how well fairness composes over multiple components in real systems. We consider two recently proposed fairness metrics for rankings: exposure and pairwise ranking accuracy gap. We provide theory that demonstrates a set of conditions under which fairness of individual models does compose. We then present an analytical framework for both understanding whether a system's signals can achieve compositional fairness, and diagnosing which of these signals lowers the overall system's end-to-end fairness the most. Despite previously bleak theoretical results, on multiple data-sets -- including a large-scale real-world recommender system -- we find that the overall system's end-to-end fairness is largely achievable by improving fairness in individual components. View details
    Preview abstract Much of the previous machine learning (ML) fairness literature assumes that protected features such as race and sex are present in the dataset, and relies upon them to mitigate fairness concerns. However, in practice factors like privacy and regulation often preclude the collection of protected features, or their use for training or inference, severely limiting the applicability of traditional fairness research. Therefore we ask: How can we train an ML model to improve fairness when we do not even know the protected group memberships? In this work we address this problem by proposing Adversarially Reweighted Learning (ARL). In particular, we hypothesize that non-protected features and task labels are valuable for identifying fairness issues, and can be used to co-train an adversarial reweighting approach for improving fairness. Our results show that ARL improves Rawlsian Max-Min fairness, with notable AUC improvements for worst-case protected groups in multiple datasets, outperforming state-of-the-art alternatives. View details
    Debiasing Embeddings for Fairer Text Classification
    1st ACL Workshop on Gender Bias for Natural Language Processing(2019)
    Preview abstract (Bolukbasi et al., 2016) demonstrated that pre-trained word embeddings can inherit gender bias from the data they were trained on. We investigate how this bias affects downstream classification tasks, using the case study of occupation classification (De-Arteaga et al.,2019). We show that traditional techniques for debiasing embeddings can actually worsen the bias of the downstream classifier by providing a less noisy channel for communicating gender information. With a relatively minor adjustment, however, we show how these same techniques can be used to simultaneously reduce bias and obtain high classification accuracy. View details
    Preview abstract As recent literature has demonstrated how classifiers often carry unintended biases toward some subgroups, deploying machine learned models to users demands careful consideration of the social consequences. How should we address this problem in a real-world system? How should we balance core performance and fairness metrics? In this paper, we introduce a MinDiff framework for regularizing classifiers toward different fairness metrics and analyze a technique with kernel-based statistical dependency tests. We run a thorough study on an academic dataset to compare the Pareto frontier achieved by different regularization approaches, and apply our kernel-based method to two large-scale industrial systems demonstrating real-world improvements. View details