
Roopal Garg
Roopal Garg is a Staff Software Engineer at Google DeepMind, where he works on improving multimodal content understanding. His focus is on building high-quality datasets efficiently and using them to develop autoraters and metrics to advance and evaluate modern Vision-Language Models (VLMs). Recently, he's been focused on hyper-detailed image descriptions, their implications for text-to-image models, and expanding these capabilities to include global geographical and cultural understanding. He received his MS in Computer Science with a focus on Natural Language Processing from the University of Southern California in 2013.
Research Areas
Authored Publications
Sort By
Google
DOCCI: Descriptions of Connected and Contrasting Images
Garrett Tanzer
Jaemin Cho
Su Wang
Sunayana Rane
Zack Berger
Zarana Parekh
(2024)
ImageInWords: Unlocking Hyper-Detailed Image Descriptions
Andrew Bunner
Ranjay Krishna
(2024)
Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment
Brian Gordon
Dani Lischinski
Daniel Cohen-Or
arXiv (2023)