ausblenden:
Schlagwörter:
Computer Science, Computer Vision and Pattern Recognition, cs.CV
Zusammenfassung:
We propose to use a model-based generative loss for training hand pose
estimators on depth images based on a volumetric hand model. This additional
loss allows training of a hand pose estimator that accurately infers the entire
set of 21 hand keypoints while only using supervision for 6 easy-to-annotate
keypoints (fingertips and wrist). We show that our partially-supervised method
achieves results that are comparable to those of fully-supervised methods which
enforce articulation consistency. Moreover, for the first time we demonstrate
that such an approach can be used to train on datasets that have erroneous
annotations, i.e. "ground truth" with notable measurement errors, while
obtaining predictions that explain the depth images better than the given
"ground truth".