Roopal Garg

Roopal Garg is a Staff Software Engineer at Google DeepMind, where he works on improving multimodal content understanding. His focus is on building high-quality datasets efficiently and using them to develop autoraters and metrics to advance and evaluate modern Vision-Language Models (VLMs). Recently, he's been focused on hyper-detailed image descriptions, their implications for text-to-image models, and expanding these capabilities to include global geographical and cultural understanding. He received his MS in Computer Science with a focus on Natural Language Processing from the University of Southern California in 2013.

Research Areas

Natural language processing

Authored Publications

ImageInWords: Unlocking Hyper-Detailed Image Descriptions

Roopal Garg

Andrea Burns

Burcu Karagol Ayan

Yonatan Bitton

Ceslee Montgomery

Yasumasa Onoe

Andrew Bunner

Ranjay Krishna

Jason Baldridge

Radu Soricut

(2024)

DOCCI: Descriptions of Connected and Contrasting Images

Alex Ku

Garrett Tanzer

Jaemin Cho

Jason Baldridge

Jordi Pont-Tuset

Roopal Garg

Su Wang

Sunayana Rane

Yasumasa Onoe

Yonatan Bitton

Zack Berger

Zarana Parekh

(2024)

Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment

Brian Gordon

Dani Lischinski

Daniel Cohen-Or

Idan Szpektor

Roopal Garg

Xi Chen

Yonatan Bitton

arXiv (2023)

Search on Google Scholar

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Roopal Garg

Research Areas

Join us

Google AI

Google Cloud

Google DeepMind

Google Labs

Roopal Garg

Research Areas

Filter by:

Publications

Years

Research Areas

Teams

Join us