Deutsch
 
Benutzerhandbuch Datenschutzhinweis Impressum Kontakt
  DetailsucheBrowse

Datensatz

DATENSATZ AKTIONENEXPORT

Freigegeben

Forschungspapier

Single-Shot Multi-Person 3D Body Pose Estimation From Monocular RGB Input

MPG-Autoren
/persons/resource/persons129023

Mehta,  Dushyant
Computer Graphics, MPI for Informatics, Max Planck Society;

/persons/resource/persons199773

Sotnychenko,  Oleksandr
Computer Graphics, MPI for Informatics, Max Planck Society;

/persons/resource/persons134216

Mueller,  Franziska
Computer Graphics, MPI for Informatics, Max Planck Society;

/persons/resource/persons206382

Xu,  Weipeng
Computer Graphics, MPI for Informatics, Max Planck Society;

/persons/resource/persons118756

Pons-Moll,  Gerard
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society;

/persons/resource/persons45610

Theobalt,  Christian
Computer Graphics, MPI for Informatics, Max Planck Society;

Externe Ressourcen
Es sind keine Externen Ressourcen verfügbar
Volltexte (frei zugänglich)

arXiv:1712.03453.pdf
(Preprint), 8MB

Ergänzendes Material (frei zugänglich)
Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar
Zitation

Mehta, D., Sotnychenko, O., Mueller, F., Xu, W., Sridhar, S., Pons-Moll, G., et al. (2017). Single-Shot Multi-Person 3D Body Pose Estimation From Monocular RGB Input. Retrieved from http://arxiv.org/abs/1712.03453.


Zitierlink: http://hdl.handle.net/21.11116/0000-0000-438F-4
Zusammenfassung
We propose a new efficient single-shot method for multi-person 3D pose estimation in general scenes from a monocular RGB camera. Our fully convolutional DNN-based approach jointly infers 2D and 3D joint locations on the basis of an extended 3D location map supported by body part associations. This new formulation enables the readout of full body poses at a subset of visible joints without the need for explicit bounding box tracking. It therefore succeeds even under strong partial body occlusions by other people and objects in the scene. We also contribute the first training data set showing real images of sophisticated multi-person interactions and occlusions. To this end, we leverage multi-view video-based performance capture of individual people for ground truth annotation and a new image compositing for user-controlled synthesis of large corpora of real multi-person images. We also propose a new video-recorded multi-person test set with ground truth 3D annotations. Our method achieves state-of-the-art performance on challenging multi-person scenes.