MoCapDeform: Monocular 3D Human Motion Capture in Deformable Scenes

Li, Zhi; Shimada, Soshi; Schiele, Bernt; Theobalt, Christian; Golyanik, Vladislav

Item

ITEM ACTIONSEXPORT

DownloadE-Mail

Please note that a newer version of this item is available:
https://pure.mpg.de/pubman/item/item_3477007_3

DetailsSummary

Released

Conference Paper

MoCapDeform: Monocular 3D Human Motion Capture in Deformable Scenes

MPS-Authors

/persons/resource/persons261416

Li, Zhi
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society;

/persons/resource/persons255947

Shimada, Soshi
Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society;

/persons/resource/persons45383

Schiele, Bernt
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society;

/persons/resource/persons45610

Theobalt, Christian
Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society;

/persons/resource/persons239654

Golyanik, Vladislav
Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society;

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

arXiv:2208.08439.pdf
(Preprint), 5MB

Supplementary Material (public)

There is no public supplementary material available

Citation

Li, Z., Shimada, S., Schiele, B., Theobalt, C., & Golyanik, V. (in press). MoCapDeform: Monocular 3D Human Motion Capture in Deformable Scenes. In International Conference on 3D Vision. Piscataway, NJ: IEEE.

Cite as: https://hdl.handle.net/21.11116/0000-000B-9CB9-5

Abstract

3D human motion capture from monocular RGB images respecting interactions of
a subject with complex and possibly deformable environments is a very
challenging, ill-posed and under-explored problem. Existing methods address it
only weakly and do not model possible surface deformations often occurring when
humans interact with scene surfaces. In contrast, this paper proposes
MoCapDeform, i.e., a new framework for monocular 3D human motion capture that
is the first to explicitly model non-rigid deformations of a 3D scene for
improved 3D human pose estimation and deformable environment reconstruction.
MoCapDeform accepts a monocular RGB video and a 3D scene mesh aligned in the
camera space. It first localises a subject in the input monocular video along
with dense contact labels using a new raycasting based strategy. Next, our
human-environment interaction constraints are leveraged to jointly optimise
global 3D human poses and non-rigid surface deformations. MoCapDeform achieves
superior accuracy than competing methods on several datasets, including our
newly recorded one with deforming background scenes.