Text to 3D Object Generation for Scalable Room Assembly
Abstract
Modern machine learning models for scene understanding, such as depth estimation and object tracking, rely on large, high-quality datasets that mimic real-world deployment scenarios. To address data scarcity, an end-to-end system for synthetic data generation for scalable, high-quality, and customizable 3D indoor scenes. By integrating text-to-image and multi-view diffusion models with NeRF-based meshing, this system generates high-fidelity 3D assets from text prompts and incorporates them into pre-defined floor plans using a rendering tool, Blender. By incorporating novel loss functions and training strategies into prior existing methods, or method supports on-demand object generation, bridging the domain gap between synthetic and real-world data. This system advances synthetic data’s role in addressing machine learning training limitations, enabling more robust and generalizable models for real-world applications.