HandVoxNet: Deep Voxel-Based Network for 3D Hand Shape and Pose Estimation from 
a Single Depth Map

Malik, Jameel; Abdelaziz, Ibrahim; Elhayek, Ahmed; Shimada, Soshi; Ali, Sk Aziz; Golyanik, Vladislav; Theobalt, Christian; Stricker, Didier

Datensatz

DATENSATZ AKTIONENEXPORT

Zur Ablage hinzufügen

Lokale TagsFreigabegeschichteDetailsÜbersicht

Freigegeben

Forschungspapier

HandVoxNet: Deep Voxel-Based Network for 3D Hand Shape and Pose Estimation from a Single Depth Map

MPG-Autoren

/persons/resource/persons255947

Shimada, Soshi
Computer Graphics, MPI for Informatics, Max Planck Society;

/persons/resource/persons239654

Golyanik, Vladislav
Computer Graphics, MPI for Informatics, Max Planck Society;

/persons/resource/persons45610

Theobalt, Christian
Computer Graphics, MPI for Informatics, Max Planck Society;

Externe Ressourcen

Es sind keine externen Ressourcen hinterlegt

Volltexte (beschränkter Zugriff)

Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.

Volltexte (frei zugänglich)

arXiv:2004.01588.pdf
(Preprint), 3MB

Ergänzendes Material (frei zugänglich)

Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar

Zitation

Malik, J., Abdelaziz, I., Elhayek, A., Shimada, S., Ali, S. A., Golyanik, V., et al. (2020). HandVoxNet: Deep Voxel-Based Network for 3D Hand Shape and Pose Estimation from a Single Depth Map. Retrieved from https://arxiv.org/abs/2004.01588.

Zitierlink: https://hdl.handle.net/21.11116/0000-0007-E0FF-D

Zusammenfassung

3D hand shape and pose estimation from a single depth map is a new and
challenging computer vision problem with many applications. The
state-of-the-art methods directly regress 3D hand meshes from 2D depth images
via 2D convolutional neural networks, which leads to artefacts in the
estimations due to perspective distortions in the images. In contrast, we
propose a novel architecture with 3D convolutions trained in a
weakly-supervised manner. The input to our method is a 3D voxelized depth map,
and we rely on two hand shape representations. The first one is the 3D
voxelized grid of the shape which is accurate but does not preserve the mesh
topology and the number of mesh vertices. The second representation is the 3D
hand surface which is less accurate but does not suffer from the limitations of
the first representation. We combine the advantages of these two
representations by registering the hand surface to the voxelized hand shape. In
the extensive experiments, the proposed approach improves over the state of the
art by 47.8% on the SynHand5M dataset. Moreover, our augmentation policy for
voxelized depth maps further enhances the accuracy of 3D hand pose estimation
on real data. Our method produces visually more reasonable and realistic hand
shapes on NYU and BigHand2.2M datasets compared to the existing approaches.