Utsav Prabhu
I am a software engineer at Google working primarily on people-centric perception tech. I received my PhD from Carnegie Mellon university.
Research Areas
Authored Publications
Sort By
Preview abstract
Use of Text-to-Image models is expanding beyond generating generic objects, as they are increasingly being adopted by diverse global communities to create visual representations of their unique culture. Current T2I benchmarks primarily evaluate image-text alignment, aesthetics and fidelity of generations for complex prompts with generic objects, overlooking the critical dimension of cultural understanding. In this work, we address this gap by defining a framework to evaluate cultural competence of T2I models, and present a scalable approach to collect cultural artifacts unique to a particular culture from Knowledge Graphs and Large Language Models in tandem. We assess the ability of state-of-the-art T2I models to generate culturally faithful and realistic images across 8 countries and 3 cultural domains. Furthermore, we emphasize the importance of T2I models reflecting a culture's diversity and introduce cultural diversity as a novel metric for T2I evaluation, drawing inspiration from the Vendi Score. We introduce T2I-GCube, a first-of-its-kind benchmark for T2I evaluation. T2I-GCube includes cultural prompts, metrics, and cultural concept spaces, enabling comprehensive assessment of T2I models' cultural knowledge and diversity. Our evaluations reveal significant gaps in the cultural knowledge of existing models and provide valuable insights into the diversity of image outputs for under-specified prompts. By introducing a novel approach to evaluating cultural diversity and knowledge in T2I models, T2I-GCube will be instrumental in fostering the development of models with enhanced cultural competence.
View details
Preview abstract
Use of Text-to-Image models is expanding beyond generating generic objects, as they are increasingly being adopted by diverse global communities to create visual representations of their unique culture. Current T2I benchmarks primarily evaluate image-text alignment, aesthetics and fidelity of generations for complex prompts with generic objects, overlooking the critical dimension of cultural understanding. In this work, we address this gap by defining a framework to evaluate cultural competence of T2I models, and present a scalable approach to collect cultural artifacts unique to a particular culture from Knowledge Graphs and Large Language Models in tandem. We assess the ability of state-of-the-art T2I models to generate culturally faithful and realistic images across 8 countries and 3 cultural domains. Furthermore, we emphasize the importance of T2I models reflecting a culture's diversity and introduce cultural diversity as a novel metric for T2I evaluation, drawing inspiration from the Vendi Score. We introduce T2I-GCube, a first-of-its-kind benchmark for T2I evaluation. T2I-GCube includes cultural prompts, metrics, and cultural concept spaces, enabling comprehensive assessment of T2I models' cultural knowledge and diversity. Our evaluations reveal significant gaps in the cultural knowledge of existing models and provide valuable insights into the diversity of image outputs for under-specified prompts. By introducing a novel approach to evaluating cultural diversity and knowledge in T2I models, T2I-GCube will be instrumental in fostering the development of models with enhanced cultural competence.
View details
Preview abstract
Use of Text-to-Image models is expanding beyond generating generic objects, as they are increasingly being adopted by diverse global communities to create visual representations of their unique culture. Current T2I benchmarks primarily evaluate image-text alignment, aesthetics and fidelity of generations for complex prompts with generic objects, overlooking the critical dimension of cultural understanding. In this work, we address this gap by defining a framework to evaluate cultural competence of T2I models, and present a scalable approach to collect cultural artifacts unique to a particular culture from Knowledge Graphs and Large Language Models in tandem. We assess the ability of state-of-the-art T2I models to generate culturally faithful and realistic images across 8 countries and 3 cultural domains. Furthermore, we emphasize the importance of T2I models reflecting a culture's diversity and introduce cultural diversity as a novel metric for T2I evaluation, drawing inspiration from the Vendi Score. We introduce T2I-GCube, a first-of-its-kind benchmark for T2I evaluation. T2I-GCube includes cultural prompts, metrics, and cultural concept spaces, enabling comprehensive assessment of T2I models' cultural knowledge and diversity. Our evaluations reveal significant gaps in the cultural knowledge of existing models and provide valuable insights into the diversity of image outputs for under-specified prompts. By introducing a novel approach to evaluating cultural diversity and knowledge in T2I models, T2I-GCube will be instrumental in fostering the development of models with enhanced cultural competence.
View details
A Step Toward More Inclusive People Annotations for Fairness
Vittorio Ferrari
Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (2021)
Preview abstract
The Open Images Dataset contains approximately 9 million images and is a widely accepted dataset for computer vision research. As is common practice for large datasets, the annotations are not exhaustive, with bounding boxes and attribute labels for only a subset of the classes in each image. In this paper, we present a new set of annotations on a subset of the Open Images dataset called the ``MIAP (More Inclusive Annotations for People)'' subset, containing bounding boxes and attributes for all of the people visible in those images. The attributes and labeling methodology for the ``MIAP'' subset were designed to enable research into model fairness. In addition, we analyze the original annotation methodology for the person class and its subclasses, discussing the resulting patterns in order to inform future annotation efforts. By considering both the original and exhaustive annotation sets, researchers can also now study how systematic patterns in training annotations affect modeling.
View details