Scene flow estimation is a long-standing problem in computer vision, where the goal is to find the scene's 3D motion from its consecutive observations. Recently, there is a research effort to compute scene flow using 3D point clouds. A main approach is to train a regression model that consumes a source and target point clouds and outputs the per-point translation vector. An alternative approach is to learn point correspondence between the point clouds, concurrently with a refinement regression of the initial flow. In both approaches the task is very challenging, since the flow is regressed in the free 3D space, and a typical solution is to resort to a large annotated synthetic dataset.
We introduce CorrFlow, a new method for scene flow estimation that can be learned on a small amount of data without using ground-truth flow supervision. In contrast to previous works, we train a pure correspondence model that is focused on learning point feature representation, and initialize the flow as the difference between a source point and its softly corresponding target point. Then, at test time, we directly optimize a flow refinement component with a self-supervised objective, which leads to a coherent flow field between the point clouds. Experiments on widely used datasets demonstrate the performance gains achieved by our method compared to existing leading techniques.