- Yanzhe Zhang
- Lu Jiang
- Greg Turk
- Diyi Yang
Abstract
Text-to-image models, which can generate high-quality images based on textual input, have recently enabled various content-creation tools. Despite significantly affecting a wide range of downstream applications, the distributions of these generated images still need to be comprehensively understood, especially regarding the potential stereotypical attributes of different genders. In this work, we propose a paradigm that utilizes fine-grained self-presentation attributes to study how different genders are presented differently in text-to-image models, namely Gender Presentation Differences. By probing the gender indicators in the input text (e.g., a woman'' or
a man''), we quantify the frequency differences of human-centric attributes (e.g., a shirt'' and
a dress'') through human annotation and introduce two novel metrics: GEP (GEnder Presentation Differences) vector and GEP score. Furthermore, the proposed automatic estimation of the two metrics correlates better with human annotations than existing CLIP-based measures, consistently across three state-of-the-art text-to-image models. Finally, we demonstrate that our metrics can generalize to gender/racial stereotypes related to occupations.
Research Areas
Learn more about how we do research
We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work