HOI-Diff: Text-Driven Synthesis of 3D Human-Object Interactions using Diffusion Models

Huaizu Jiang
Varun Jampani
Yiming Xie
Zizhao Wu
Xiaogang Peng
CVPR 2025 Workshop of HuMoGen

Abstract

We address the problem of generating motions for 3D human-object interactions. Unlike previous methods, which mainly focus on static objects and limited interaction types, our work jointly synthesizes the motion of humans and objects for various interactions with text conditioning. Our key idea is to leverage a motion diffusion model for both human and object motion synthesis, which produces coherent motion of interactions. To precisely generate the physically plausible contact between human and object, the interaction correction module, which is a diffusion-based affordance prediction model paired with spatial guidance, is proposed to improve the interactions at each diffusion step. Experiments on the BEHAVE dataset demonstrate the effectiveness of our approach, producing realistic motions of various interactions between human and objects with text prompts.
×