Kubric: A scalable dataset generator

Abhijit Kundu; Andrea Tagliasacchi; Anissa Yuenming Mak; Austin Stone; Carl Doersch; Cengiz Oztireli; Charles Herrmann; Dan Gnanapragasam; Daniel Duckworth; Daniel Rebain; David James Fleet; Deqing Sun; Derek Nowrouzezahrai; Dmitry Lagun; Etienne Pot; Fangcheng Zhong; Florian Golemo; Francois Belletti; Henning Meyer; Hsueh-Ti (Derek) Liu; Issam Laradji; Klaus Greff; Kwang Moo Yi; Lucas Beyer; Matan Sela; Mehdi S. M. Sajjadi; Noha Radwan; Sara Sabour; Suhani Vora; Thomas Kipf; Tianhao Wu; Vincent Sitzmann; Yilun Du; Yishu Miao

Kubric: A scalable dataset generator

Abhijit Kundu

Andrea Tagliasacchi

Anissa Yuenming Mak

Austin Stone

Carl Doersch

Cengiz Oztireli

Charles Herrmann

Dan Gnanapragasam

Daniel Duckworth

Daniel Rebain

David James Fleet

Deqing Sun

Derek Nowrouzezahrai

Dmitry Lagun

Etienne Pot

Fangcheng Zhong

Florian Golemo

Francois Belletti

Henning Meyer

Hsueh-Ti (Derek) Liu

Issam Laradji

Klaus Greff

Kwang Moo Yi

Lucas Beyer

Matan Sela

Mehdi S. M. Sajjadi

Noha Radwan

Sara Sabour

Suhani Vora

Thomas Kipf

Tianhao Wu

Vincent Sitzmann

Yilun Du

Yishu Miao

(2022)

Google Scholar

Abstract

Data is the driving force of machine learning. The amount and quality of training data is often more important for the performance of a system than the details of its architecture. Data is also an important tool for testing specific hypothesis, and for empirically evaluating the behaviour of complex systems. Synthetic data generation represents a powerful tool that can address all these shortcomings: 1) it is cheap 2) supports rich ground-truth annotations 3) offers full control over data and 4) can circumvent privacy and legal concerns. Unfortunately the toolchain for generating data is less well developed than that for building models. We aim to improve this situation by introducing Kubric: a scalable open-source pipeline for generating realistic image and video data with rich ground truth annotations.
We also publish a collection of generated datasets and baseline results on several vision tasks.

Research Areas

Machine perception

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Kubric: A scalable dataset generator

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs