HOI-Diff: Text-Driven Synthesis of 3D Human-Object Interactions using Diffusion Models
Abstract
We address the problem of generating motions for 3D human-object interactions. Unlike previous methods, which mainly focus on static objects and limited interaction types, our work jointly synthesizes the motion of humans and objects for various interactions with text conditioning. Our key idea is to leverage a motion diffusion model for both human and object motion synthesis, which produces coherent motion of interactions. To precisely generate the physically plausible contact between human and object, the interaction correction module, which is a diffusion-based affordance prediction model paired with spatial guidance, is proposed to improve the interactions at each diffusion step. Experiments on the BEHAVE dataset demonstrate the effectiveness of our approach, producing realistic motions of various interactions between human and objects with text prompts.