Machine perception

Research in machine perception tackles the hard problems of understanding images, sounds, music and video. In recent years, our computers have become much better at such tasks, enabling a variety of new applications such as: content-based search in Google Photos and Image Search, natural handwriting interfaces for Android, optical character recognition for Google Drive documents, and recommendation systems that understand music and YouTube videos. Our approach is driven by algorithms that benefit from processing very large, partially-labeled datasets using parallel computing clusters. A good example is our recent work on object recognition using a novel deep convolutional neural network architecture known as Inception that achieves state-of-the-art results on academic benchmarks and allows users to easily search through their large collection of Google Photos. The ability to mine meaningful information from multimedia is broadly applied throughout Google.

Recent Publications

Marginalized Bundle Adjustment: Multi-View Camera Pose from Monocular Depth Estimates

Shengjie Zhu

Ahmed Abdelkader

Mark Matthews

Xiaoming Liu

Vincent Chu

International Conference on 3D Vision (2026)

On-the-Fly OVD Adaptation with FLAME: Few-shot Localization via Active Marginal-Samples Exploration

Yehonathan Refael

Amit Aides

Aviad Barzilai

George Leifman

Vered Silverman

Bolous Jaber

Tomer Shekel

Genady Beryozkin

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops (2026), pp. 886-894

Gaze Target Estimation Anywhere with Concepts

Xu Cao

Houze Yang

Vipin Gunda

Zhongyi Zhou

Tianyu Xu

Adarsh Kowdle

Inki Kim

Jim Rehg

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2026)

VISTA: A Test-Time Self-Improving Video Generation Agent

Xuan Long Do

Xingchen Wan

Hootan Nakhost

Chen-Yu Lee

Tomas Pfister

Sercan Arik

The IEEE/CVF Conference on Computer Vision and Pattern Recognition (to appear) (2026)

Mull-Tokens: Modality-Agnostic Latent Thinking

Arijit Ray

Ahmed Abdelkader

Chengzhi Mao

Bryan A. Plummer

Kate Saenko

Ranjay Krishna

Leonidas Guibas

Vincent Chu

IEEE/CVF Conference on Computer Vision and Pattern Recognition (Findings) (2026) (to appear)

Perceptual Evaluation of a Mix Presentation for Immersive Audio with IAMF

Carlos Tejeda-Ocampo

Toni Hirvonen

Ema Souza-Blanes

Mahmoud Namazi

Jan Skoglund

AES 158th Convention of the Audio Engineering Society (2025)

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Machine perception

Recent Publications

Some of our teams

Join us

Google AI

Google Cloud

Google DeepMind

Google Labs