English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Paper

High-Fidelity Neural Human Motion Transfer from Monocular Video

MPS-Authors
/persons/resource/persons239654

Golyanik,  Vladislav
Computer Graphics, MPI for Informatics, Max Planck Society;

/persons/resource/persons229949

Elgharib,  Mohamed
Computer Graphics, MPI for Informatics, Max Planck Society;

/persons/resource/persons45449

Seidel,  Hans-Peter
Computer Graphics, MPI for Informatics, Max Planck Society;

/persons/resource/persons45610

Theobalt,  Christian
Computer Graphics, MPI for Informatics, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)

arXiv:2012.10974.pdf
(Preprint), 25MB

Supplementary Material (public)
There is no public supplementary material available
Citation

Kappel, M., Golyanik, V., Elgharib, M., Henningson, J.-O., Seidel, H.-P., Castillo, S., et al. (2020). High-Fidelity Neural Human Motion Transfer from Monocular Video. Retrieved from https://arxiv.org/abs/2012.10974.


Cite as: http://hdl.handle.net/21.11116/0000-0007-B715-3
Abstract
Video-based human motion transfer creates video animations of humans following a source motion. Current methods show remarkable results for tightly-clad subjects. However, the lack of temporally consistent handling of plausible clothing dynamics, including fine and high-frequency details, significantly limits the attainable visual quality. We address these limitations for the first time in the literature and present a new framework which performs high-fidelity and temporally-consistent human motion transfer with natural pose-dependent non-rigid deformations, for several types of loose garments. In contrast to the previous techniques, we perform image generation in three subsequent stages, synthesizing human shape, structure, and appearance. Given a monocular RGB video of an actor, we train a stack of recurrent deep neural networks that generate these intermediate representations from 2D poses and their temporal derivatives. Splitting the difficult motion transfer problem into subtasks that are aware of the temporal motion context helps us to synthesize results with plausible dynamics and pose-dependent detail. It also allows artistic control of results by manipulation of individual framework stages. In the experimental results, we significantly outperform the state-of-the-art in terms of video realism. Our code and data will be made publicly available.