- Christine Kaeser-Chen
- Onur Gonen Guleryuz
We introduce a simple model for the human hand skeleton that is geared toward estimating 3D hand poses from 2D keypoints. The estimation problem arises in AR/VR scenarios where low-cost cameras are used to generate 2D views through which rich interactions with the world are desired. Starting with a noisy set of 2D hand keypoints (camera-plane coordinates of detected joints of the hand), the proposed algorithm generates 3D keypoints that are (i) compliant with human hand skeleton constraints and (ii) perspective-project down to the given 2D keypoints. Our work considers the 2D to 3D lifting problem algebraically, identifies the parts of the hand that can be lifted accurately, points out the parts that may lead to ambiguities, and proposes remedies for ambiguous cases. Most importantly, we show that the finger-tip localization errors are a good proxy for the errors at other finger joints. This observation leads to a look-up-table-based formulation that instantaneously determines finger poses without solving constrained trigonometric problems. The result is a fast algorithm running super real-time on a single core. When hand bone-lengths are unknown our technique estimates these and allows smooth AR/VR sessions where a user's hand is automatically estimated in the beginning and the rest of the session seamlessly continued. Our work provides accurate 3D results that are competitive with the state-of-the-art without requiring any 3D training data.