- Yanzhe Zhang
- Lu Jiang
- Greg Turk
- Diyi Yang
Text-to-image models, which can generate high-quality images based on textual input, have recently enabled various content-creation tools. Despite significantly affecting a wide range of downstream applications, the distributions of these generated images still need to be comprehensively understood, especially regarding the potential stereotypical attributes of different genders. In this work, we propose a paradigm that utilizes fine-grained self-presentation attributes to study how different genders are presented differently in text-to-image models, namely Gender Presentation Differences. By probing the gender indicators in the input text (e.g.,
a woman'' ora man''), we quantify the frequency differences of human-centric attributes (e.g.,
a shirt'' anda dress'') through human annotation and introduce two novel metrics: GEP (GEnder Presentation Differences) vector and GEP score. Furthermore, the proposed automatic estimation of the two metrics correlates better with human annotations than existing CLIP-based measures, consistently across three state-of-the-art text-to-image models. Finally, we demonstrate that our metrics can generalize to gender/racial stereotypes related to occupations.