The Objectron dataset is a collection of short, object-centric video clips capturing objects from different angles, each of which is accompanied by AR session metadata that includes camera poses, sparse point-clouds and surface planes. The data also contain manually annotated 3D bounding boxes for each object, which describe the object’s position, orientation, and dimensions. The dataset consists of 15K annotated video clips supplemented with over 4M annotated images for a limited set of categories targeted towards common household objects. In addition, to ensure geo-diversity, our dataset is collected from 10 countries across five continents.