Arsha Nagrani

Arsha Nagrani

I joined Google as a Research Scientist in 2020, after completing my PhD in Computer Vision at the University of Oxford, UK. My research is focused on multimodal machine learning techniques for video understanding, including using sound and text to learn better representations. Check out my homepage for more about me and my research.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Google
Neptune: The Long Orbit to Benchmarking Long Video Understanding
Ramin Mehran
Rachel Hornung
Nitesh Bharadwaj Gundavarapu
Nilpa Jha
Austin Myers
Xingyi Zhou
Boqing Gong
Yukun Zhu
ArXiv (2024)
PaLI-X: On Scaling up a Multilingual Vision and Language Model
Josip Djolonga
Piotr Padlewski
Basil Mustafa
Carlos Riquelme
Sebastian Goodman
Yi Tay
Siamak Shakeri
Daniel Salz
Michael Tschannen
Hexiang (Frank) Hu
Mandar Joshi
Matthias Minderer
Filip Pavetić
Gang Li
Lucas Beyer
Anurag Arnab
Yuanzhong Xu
Keran Rong
Alexander Kolesnikov
Xiaohua Zhai
Neil Houlsby
Computer Vision and Pattern Recognition Conference (CVPR) (2024)
UnLoc: a unified framework for video localization tasks
Shen Yan
Xuehan Xiong
Anurag Arnab
Zhonghao Wang
Weina Ge
International Conference on Computer Vision (2023)
AVATAR: Unconstrained Audiovisual Speech Recognition
Valentin Gabeur
Paul Hongsuck Seo
Karteek Alahari
Interspeech (2022)
Masking Modalities for Cross-modal Video Retrieval
Valentin Gabeur
Karteek Alahari
Winter Conference on Applications of Computer Vision (WACV) (2022) (to appear)
Learning Audio-Video Modalities from Image Captions
Paul Hongsuck Seo
Anja Hauth
Santiago Manen
European Conference on Computer Vision (2022)