We present H-NeRF, neural radiance fields for rendering and temporal (4D) reconstruction of a human in motion as captured by a sparse set of cameras or even from a monocular video. Our NeRF-inspired approach combines ideas from neural scene representation, novel-view synthesis, and implicit statistical geometric human representations. H-NeRF allows to accurately synthesize images of the observed subject under novel camera views and human poses. Instead of learning a radiance field in empty space, we attach it to a structured implicit human body model, represented using signed distance functions. This allows us to robustly fuse information from sparse views and, at test time, to extrapolate beyond the observed poses or views. Moreover, we apply geometric constraints to co-learn the structure of the observed subject (including both body and clothing) and to regularize the radiance field to geometrical plausible solutions. Extensive experiments on multiple datasets demonstrate the robustness and accuracy of our approach and its generalization capabilities beyond the sparse training set of poses and views.