HandFlow: Quantifying View-Dependent 3D Ambiguity in Two-Hand Reconstruction 
with Normalizing Flow

Wang, Jiayi; Luvizon, Diogo; Mueller, Franziska; Bernard, Florian; Kortylewski, Adam; Casas, Dan; Theobalt, Christian

doi:10.2312/vmv.20221209

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Conference Paper

HandFlow: Quantifying View-Dependent 3D Ambiguity in Two-Hand Reconstruction with Normalizing Flow

MPS-Authors

/persons/resource/persons244018

Wang, Jiayi
Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society;

/persons/resource/persons282941

Luvizon, Diogo
Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society;

/persons/resource/persons283728

Kortylewski, Adam
Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society;

/persons/resource/persons45610

Theobalt, Christian
Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society;

External Resource

https://diglib.eg.org/handle/10.2312/vmv20221209
(Supplementary material)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

arXiv:2210.01692.pdf
(Preprint), 7MB

099-106.pdf
(Publisher version), 7MB

Supplementary Material (public)

There is no public supplementary material available

Citation

Wang, J., Luvizon, D., Mueller, F., Bernard, F., Kortylewski, A., Casas, D., et al. (2022). HandFlow: Quantifying View-Dependent 3D Ambiguity in Two-Hand Reconstruction with Normalizing Flow. In International Symposium on Vision, Modeling, and Visualization (pp. 99-106). Eurographics Association. doi:10.2312/vmv.20221209.

Cite as: https://hdl.handle.net/21.11116/0000-000B-9CDC-E

Abstract

Reconstructing two-hand interactions from a single image is a challenging
problem due to ambiguities that stem from projective geometry and heavy
occlusions. Existing methods are designed to estimate only a single pose,
despite the fact that there exist other valid reconstructions that fit the
image evidence equally well. In this paper we propose to address this issue by
explicitly modeling the distribution of plausible reconstructions in a
conditional normalizing flow framework. This allows us to directly supervise
the posterior distribution through a novel determinant magnitude
regularization, which is key to varied 3D hand pose samples that project well
into the input image. We also demonstrate that metrics commonly used to assess
reconstruction quality are insufficient to evaluate pose predictions under such
severe ambiguity. To address this, we release the first dataset with multiple
plausible annotations per image called MultiHands. The additional annotations
enable us to evaluate the estimated distribution using the maximum mean
discrepancy metric. Through this, we demonstrate the quality of our
probabilistic reconstruction and show that explicit ambiguity modeling is
better-suited for this challenging problem.