We propose a method to detect and reconstruct multiple 3D objects from a single 2D image. The method is based on a key-point detector that localizes object centers in the image and then predicts all necessary properties for multi-object reconstruction: oriented 3D bounding boxes, 3D shapes, and semantic class labels. By formulating 3D shape reconstruction as a classification problem, the method is agnostic to specific shape representations. Specifically, the method uses CAD/mesh models, to reconstruct realistic and visually pleasing shapes (unlike e.g. voxel-based methods) and relies on point clouds and voxel representations to formulate the loss functions. Our method formulates 3D shape reconstruction as a classification problem, i.e. selecting among exemplar CAD models from the training set. This makes it agnostic to shape representations, and enables the reconstruction of realistic and visually-pleasing shapes (unlike e.g. voxel-based methods). At the same time, we also rely on point clouds and voxel representations derived from the CAD models to formulate the loss functions. In particular, a collision-loss penalizes intersecting objects, further increasing the realism of the reconstructed scenes. The method is a single-stage approach, thus it is orders-ofmagnitude faster than two-stage approaches, it is fully differentiable and end-to-end trainable.