Sunipa Dev
Sunipa Dev is a Senior Research Scientist in the Responsible AI and Human Centered Technologies Org. Her research is centered around auditing generative AI systems for safety, fairness, and inclusivity.
Research Areas
Authored Publications
Sort By
Preview abstract
Stereotypes are oversimplified beliefs and ideas about particular groups of people. These cognitive biases are omnipresent in our language, reflected in human-generated dataset and potentially learned and perpetuated by language technologies. Although mitigating stereotypes in language technologies is necessary for preventing harms, stereotypes can impose varying levels of risks for targeted individuals and social groups by appearing in various contexts. Technical challenges in detecting stereotypes are rooted in the societal nuances of stereotyping, making it impossible to capture all intertwined interactions of social groups in diverse cultural context in one generic benchmark. This paper delves into the nuances of detecting stereotypes in an annotation task with humans from various regions of the world. We iteratively disambiguate our definition of the task, refining it as detecting ``generalizing language'' and contribute a multilingual, annotated dataset consisting of sentences mentioning a wide range of social identities in 9 languages and labeled on whether they make broad statements and assumptions about those groups. We experiment with training generalizing language detection models, which provide insight about the linguistic context in which stereotypes can appear, facilitating future research in addressing the dynamic, social aspects of stereotypes.
View details
Preview abstract
While large, generative, multilingual models are rapidly being developed and deployed, their safety and fairness evaluations primarily hinge on resources collected in the English language and some limited translations. This has been demonstrated to be insufficient, and severely lacking in nuances of unsafe language and stereotypes prevalent in different languages and the geographical pockets they are prevalent in. Gathering these resources, at scale, in varied languages and regions also poses a challenge as it requires expansive sociolinguistic knowledge and can also be prohibitively expensive. We utilize an established methodology of coupling LLM generations with distributed annotations to overcome these gaps and create the resource SeeGULL Multilingual, spanning 20 languages across 23 regions.
View details
Preview abstract
Misgendering refers to the act of incorrectly identifying or addressing someone's gender.
While misgendering is both a factual inaccuracy and a toxic act of identity erasure, research on fact-checking and toxicity detection does not address it.
We are the first to bridge this gap by introducing a dataset, \dataset, to assist in developing interventions for misgendering.
The misgendering interventions task can be divided into two sub-tasks: (i) detecting misgendering, followed by (ii) editing misgendering where misgendering is present, in domains where editing is appropriate.
We introduce a dataset containing a total of 3806 instances of tweets, YouTube comments, and LLM-generated text about 30 non-cisgender individuals annotated for whether they contain misgendering or not.
LLM-generated text is also annotated for edits required to fix misgendering.
Using this dataset, we set initial benchmarks by evaluating existing NLP systems and highlight challenges for future models to address.
Additionally, we conducted a survey of non-cisgender individuals in the US to understand opinions about automated interventions for text-based misgendering.
We find interest for interventions along with concerns for potential harm.
View details
ViSAGe: A Global-Scale Analysis of Visual Stereotypes in Text-to-Image Generation
Akshita Jha
Sarah Laszlo
Rida Qadri
Chandan Reddy
ACL (2024)
Preview abstract
Recent studies have highlighted the issue of varying degrees of stereotypical depictions for different identity group. However, these existing approaches have several key limitations, including a noticeable lack of coverage of identity groups in their evaluation, and the range of their associated stereotypes. Additionally, these studies often lack a critical distinction between inherently visual stereotypes, such as `brown' or `sombrero', and culturally influenced stereotypes like `kind' or `intelligent'. In this work, we address these limitations by grounding our evaluation of regional, geo-cultural stereotypes in the generated images from Text-to-Image models by leveraging existing textual resources. We employ existing stereotype benchmarks to evaluate stereotypes and focus exclusively on the identification of visual stereotypes within the generated images spanning 135 identity groups. We also compute the offensiveness across identity groups, and check the feasibility of identifying stereotypes automatically. Further, through a detailed case study and quantitative analysis, we reveal how the default representations of all identity groups have a more stereotypical appearance, and for historically marginalized groups, how the images across different attributes are visually more similar than other groups, even when explicitly prompted otherwise.
View details
Preview abstract
Misgendering is the act of referring to someone in way that does not reflect their gender identity. Translation systems, including foundation models capable of translation, can produce errors that result in misgendering harms. To measure the extent of such potential harms when translating into and out of English, we introduce a dataset, MiTTenS, covering 26 languages. The dataset is constructed with handcrafted passages that target known failure patterns, longer synthetically generated passages, and natural passages sourced from multiple domains. We demonstrate the usefulness of the dataset by evaluating both dedicated neural machine translation systems and foundation models, and show that all systems exhibit errors resulting in misgendering harms, even in high resource languages.
View details
Preview abstract
How reliably can we trust the scores obtained from social bias benchmarks as faithful indicators of problematic social biases in a given model?
In this work we study this question by contrasting social biases with \underline{non}-social biases that might not even be discernible to human eye. To do so, empirically we simulate various alternative constructions for a given benchmark based on innocuous modifications. (such as paraphrasing or random-sampling) that maintain the essence of their social bias.
On two well-known social bias benchmarks (Winogender(Rudinger et al, 2019) and BiasNLI(Dev et al 2020)) we observe that the choice of these shallow modifications have surprising effect in the resulting degree of bias across various models.
We hope these troubling observations motivates more robust measures of social biases.
View details
Building Stereotype Repositories with Complementary Approaches
Akshita Jha
Jaya Goyal
Dinesh Tewari
C3NLP workshop at EACL 2023 (2023)
Preview abstract
Measurements of fairness in NLP have been critiqued for lacking concrete definitions of biases or harms measured, and for perpetuating a singular, Western narrative of fairness globally. To combat some of these pivotal issues, methods for curating datasets and benchmarks that target specific harms are rapidly emerging. However, these methods still face the significant challenge of achieving coverage over global cultures and perspectives at scale. To address this, in this paper, we highlight the utility and importance of complementary approaches in these curation strategies, which leverage both community engagement as well as large generative models. We specifically target the harm of stereotyping and demonstrate a pathway to build a benchmark that covers stereotypes about diverse, and intersectional identities.
View details
Preview abstract
Gender bias in language technologies has been widely studied, but research has mostly been restricted to a binary paradigm of gender.
It is important to also consider non-binary gender identities, as excluding them can cause further harm to an already marginalized group.
One way in which English-speaking individuals linguistically encode their gender identity is through third-person personal pronoun declarations.
This is often done using two or more pronoun forms, e.g., \textit{xe/xem}, or \textit{xe/xem/xyr}.
In this paper, we comprehensively evaluate state-of-the-art language models for their ability to correctly use declared third-person personal pronouns.
As far as we are aware, we are the first to do so.
We evaluate language models in both zero-shot and few-shot settings.
Models are still far from zero-shot gendering non-binary individuals accurately, and most also struggle with correctly using gender-neutral pronouns (singular \textit{they, them, their} etc.).
This poor performance may be due to the lack of representation of non-binary pronouns in pre-training corpora, and some memorized associations between pronouns and names.
We find an overall improvement in performance for non-binary pronouns when using in-context learning, demonstrating that language models with few-shot capabilities can adapt to using declared pronouns correctly.
View details
Preview abstract
Along with the recent advances in large language modeling, there is growing concern that language technologies may reflect, propagate, and amplify various social stereotypes about groups of people. Publicly available stereotype benchmarks play a crucial role in detecting and mitigating this issue in language technologies to prevent both representational and allocational harms in downstream applications. However, existing stereotype benchmarks are limited in their size and coverage, largely restricted to stereotypes prevalent in the Western society. This is especially problematic as language technologies are gaining hold across the globe. To address this gap, we present SeeGULL, a broad-coverage stereotype dataset, expanding the coverage by utilizing the generative capabilities of large language models such as PaLM and GPT-3, and leveraging a globally diverse rater pool to validate prevalence of those stereotypes in society. SeeGULL is an order of magnitude larger in terms of size, and contains stereotypes for 179 identity groups spanning 6 continents, 8 different regions, 178 countries, 50 US states, and 31 Indian states and union territories. We also get fine-grained offensiveness scores for different stereotypes and demonstrate how stereotype perceptions for the same identity group differs across in-region vs out-region annotators.
View details
"I wouldn’t say offensive but...": Disability-Centered Perspectives on Large Language Models
Vinitha Gadiraju
Alex Taylor
Robin Brewer
Proceedings of FAccT 2023 (2023) (to appear)
Preview abstract
Large language models (LLMs) trained on real-world data can inadvertently reflect harmful societal biases, particularly toward historically marginalized communities. While previous work has primarily focused on harms related to age and race, emerging research has shown that biases toward disabled communities exist. This study extends prior work exploring the existence of harms by identifying categories of LLM-perpetuated harms toward the disability community. We conducted 19 focus groups, during which 56 participants with disabilities probed a dialog model about disability and discussed and annotated its responses. Participants rarely characterized model outputs as blatantly offensive or toxic. Instead, participants used nuanced language to detail how the dialog model mirrored subtle yet harmful stereotypes they encountered in their lives and dominant media, e.g., inspiration porn and able-bodied saviors. Participants often implicated training data as a cause for these stereotypes and recommended training the model on diverse identities from disability-positive resources. Our discussion further explores representative data strategies to mitigate harm related to different communities through annotation co-design with ML researchers and developers.
View details