English
 
User Manual Privacy Policy Disclaimer Contact us
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Paper

Pose-Guided Human Animation from a Single Image in the Wild

MPS-Authors
/persons/resource/persons226679

Liu,  Lingjie
Computer Graphics, MPI for Informatics, Max Planck Society;

/persons/resource/persons239654

Golyanik,  Vladislav
Computer Graphics, MPI for Informatics, Max Planck Society;

/persons/resource/persons45610

Theobalt,  Christian
Computer Graphics, MPI for Informatics, Max Planck Society;

External Ressource
No external resources are shared
Fulltext (public)

arXiv:2012.03796.pdf
(Preprint), 3MB

Supplementary Material (public)
There is no public supplementary material available
Citation

Yoon, J. S., Liu, L., Golyanik, V., Sarkar, K., Park, H. S., & Theobalt, C. (2020). Pose-Guided Human Animation from a Single Image in the Wild. Retrieved from https://arxiv.org/abs/2012.03796.


Cite as: http://hdl.handle.net/21.11116/0000-0007-E9F3-0
Abstract
We present a new pose transfer method for synthesizing a human animation from a single image of a person controlled by a sequence of body poses. Existing pose transfer methods exhibit significant visual artifacts when applying to a novel scene, resulting in temporal inconsistency and failures in preserving the identity and textures of the person. To address these limitations, we design a compositional neural network that predicts the silhouette, garment labels, and textures. Each modular network is explicitly dedicated to a subtask that can be learned from the synthetic data. At the inference time, we utilize the trained network to produce a unified representation of appearance and its labels in UV coordinates, which remains constant across poses. The unified representation provides an incomplete yet strong guidance to generating the appearance in response to the pose change. We use the trained network to complete the appearance and render it with the background. With these strategies, we are able to synthesize human animations that can preserve the identity and appearance of the person in a temporally coherent way without any fine-tuning of the network on the testing scene. Experiments show that our method outperforms the state-of-the-arts in terms of synthesis quality, temporal coherence, and generalization ability.