Perception

We design systems that enable computers to "understand" the world, via a range of modalities including audio, image, and video understanding.

We design systems that enable computers to "understand" the world, via a range of modalities including audio, image, and video understanding.

About the team

The Perception team is a group focused on building systems that can interpret sensory data such as image, sound, video, and more. Our research helps power many products across Google; image and video understanding in Search and Google Photos, computational photography for Pixel phones and Google Maps, machine learning APIs for Google Cloud and Youtube, accessibility technologies like Live Transcribe, applications in Nest Hub Max, mobile augmented reality experiences in Duo video calls and more.

We actively contribute to open source and research communities, providing media processing technologies (e.g. Mediapipe) to enable the building of computer vision applications with TensorFlow. Further, we have released several large-scale datasets for machine learning, including AudioSet, AVA, Open Images, and YouTube-8M.

In doing all this, we adhere to AI principles to ensure that these technologies work well for everyone. We value innovation, collaboration, respect, and building an inclusive and diverse team and research community, and we work closely with the PAIR team to build ML Fairness frameworks.