MediaPipe: A Framework for Perceiving and Processing Reality

Camillo Lugaresi
Jiuqiang Tang
Hadon Nash
Chris McClanahan
Esha Uboweja
Michael Hays
Fan Zhang
Chuo-Ling Chang
Ming Yong
Wan-Teh Chang
Wei Hua
Manfred Georg
Third Workshop on Computer Vision for AR/VR at IEEE Computer Vision and Pattern Recognition (CVPR) 2019

Abstract

Building an application that processes perceptual inputs involves more than running an ML model. Developers have to harness the capabilities of a wide range of devices; balance resource usage and quality of results; run multiple operations in parallel and with pipelining; and ensure that time-series data is properly synchronized. The MediaPipe framework addresses these challenges. A developer can use MediaPipe to easily and rapidly combine existing and new perception components into prototypes and advance them to polished cross-platform applications. The developer can configure an application built with MediaPipe to manage resources efficiently (both CPU and GPU) for low latency performance, to handle synchronization of time-series data such as audio and video frames and to measure performance and resource consumption. We show that these features enable a developer to focus on the algorithm or model development, and use MediaPipe as an environment for iteratively improving their application, with results reproducible across different devices and platforms. MediaPipe will be open-sourced at https://github.com/google/mediapipe.

Research Areas