Francis Engelmann
Research Areas
Authored Publications
Google Publications
Other Publications
Sort By
Preview abstract
We introduce the task of open-vocabulary 3D instance segmentation.
Traditional approaches for 3D instance segmentation largely rely on existing 3D annotated datasets, which are restricted to a closed-set of objects.
This is an important limitation for real-life applications in which an autonomous agent might need to perform tasks guided by novel, open-vocabulary queries related to objects from a wider range of categories.
Recently, open-vocabulary 3D scene understanding methods have emerged to address this problem by learning queryable features per each point in the scene. While such a representation can be directly employed to perform semantic segmentation, existing methods have no notion of object instances.
In this work, we address the open-vocabulary 3D instance segmentation problem, and propose OpenMask3D, which is a zero-shot approach for open-vocabulary 3D instance segmentation.
Guided by predicted class-agnostic 3D instance masks, our model aggregates per-mask features via multi-view fusion of CLIP-based image embeddings.
We conduct experiments and ablation studies on the ScanNet200 dataset to evaluate the performance of OpenMask3D, and provide insights about the task of open-vocabulary 3D instance segmentation. We show that our approach outperforms other open-vocabulary counterparts particularly on the long-tail distribution.
View details
Preview abstract
We propose a method to detect and reconstruct multiple 3D objects from a single 2D image. The method is based on a key-point detector that localizes object centers in the image and then predicts all necessary properties for multi-object reconstruction: oriented 3D bounding
boxes, 3D shapes, and semantic class labels. By formulating 3D shape reconstruction as a classification problem, the method is agnostic
to specific shape representations. Specifically, the method uses CAD/mesh models, to reconstruct realistic and visually pleasing shapes (unlike e.g. voxel-based methods) and relies on point clouds and voxel representations to formulate the loss functions. Our method formulates 3D shape reconstruction as a classification problem, i.e. selecting among exemplar CAD models from the training set. This makes it agnostic to shape representations, and enables the reconstruction of realistic and visually-pleasing shapes (unlike e.g. voxel-based methods). At the same time, we also rely on point clouds and voxel representations derived from the CAD models to formulate the loss functions. In particular, a collision-loss penalizes intersecting objects, further increasing the realism of the reconstructed scenes. The method is a single-stage approach, thus it is orders-ofmagnitude faster than two-stage approaches, it is fully differentiable and end-to-end trainable.
View details
3D-MPA: Multi Proposal Aggregation for 3D Semantic Instance Segmentation
Bastian Leibe
Matthias Niessner
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Preview abstract
We present 3D-MPA, a method for instance segmentation on 3D point clouds.
Given an input point cloud, we propose an object-centric approach where each point votes for its object center.
We sample object proposals from the predicted object centers.
Then, we learn proposal features from grouped point features that voted for the same object center.
A graph convolutional network introduces inter-proposal relations, providing higher-level feature learning in addition to the lower-level point features.
Each proposal comprises a semantic label, a set of associated points over which we define a foreground-background mask, an objectness score and aggregation features.
Previous works usually perform non-maximum-suppression (NMS) over proposals to obtain the final object detections or semantic instances.
However, NMS can discard potentially correct predictions.
Instead, our approach keeps all proposals and groups them together based on the learned aggregation features.
We show that grouping proposals improves over NMS and outperforms previous state-of-the-art methods on the tasks of 3D object detection and semantic instance segmentation on the ScanNetV2 benchmark and the S3DIS dataset.
View details
No Results Found