Research Areas
Authored Publications
Sort By
Google
Learning Audio-Video Modalities from Image Captions
Paul Hongsuck Seo
Anja Hauth
Santiago Manen
European Conference on Computer Vision (2022)
Multiview Transformers for Video Recognition
Shen Yan
Xuehan Xiong
Anurag Arnab
Zhichao Lu
Mi Zhang
The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR) (2022)
Masking Modalities for Cross-modal Video Retrieval
Valentin Gabeur
Karteek Alahari
Winter Conference on Applications of Computer Vision (WACV) (2022) (to appear)
TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency
Anna Rohrbach
Medhini Narasimhan
Trevor Darrell
European Conference on Computer Vision (2022)
AVATAR: Unconstrained Audiovisual Speech Recognition
Valentin Gabeur
Paul Hongsuck Seo
Karteek Alahari
Interspeech (2022)
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Karteek Alahari
European Conference on Computer Vision (ECCV) (2020)