XNect: Real-time Multi-person 3D Human Pose Estimation with a Single RGB Camera

Mehta, Dushyant; Sotnychenko, Oleksandr; Mueller, Franziska; Xu, Weipeng; Elgharib, Mohamed; Fua, Pascal; Seidel, Hans-Peter; Rhodin, Helge; Pons-Moll, Gerard; Theobalt, Christian

Local TagsRelease HistoryDetailsSummary

XNect: Real-time Multi-person 3D Human Pose Estimation with a Single RGB Camera

Mehta, D., Sotnychenko, O., Mueller, F., Xu, W., Elgharib, M., Fua, P., et al. (2019). XNect: Real-time Multi-person 3D Human Pose Estimation with a Single RGB Camera. Retrieved from http://arxiv.org/abs/1907.00837.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/21.11116/0000-0003-FE21-A Version Permalink: https://hdl.handle.net/21.11116/0000-000E-3299-D

Genre: Paper

Files

show Files

hide Files

:

arXiv:1907.00837.pdf (Preprint), 10MB

View Save

File Permalink:
https://hdl.handle.net/21.11116/0000-0003-FE23-8

Name:
arXiv:1907.00837.pdf

Description:
File downloaded from arXiv at 2019-07-09 10:40

OA-Status:

Visibility:
Public

MIME-Type / Checksum:
application/pdf / [MD5]

Technical Metadata:

View

Copyright Date:
-

Copyright Info:
-

License:
http://arxiv.org/licenses/nonexclusive-distrib/1.0/

Locators

show

Creators

show

hide

Creators:
Mehta, Dushyant¹, Author
Sotnychenko, Oleksandr¹, Author
Mueller, Franziska¹, Author
Xu, Weipeng¹, Author
Elgharib, Mohamed¹, Author
Fua, Pascal², Author
Seidel, Hans-Peter¹, Author
Rhodin, Helge², Author
Pons-Moll, Gerard³, Author
Theobalt, Christian¹, Author

Affiliations:
1Computer Graphics, MPI for Informatics, Max Planck Society, ou_40047
2External Organizations, ou_persistent22
3Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society, ou_1116547

Content

show

hide

Free keywords: Computer Science, Computer Vision and Pattern Recognition, cs.CV,Computer Science, Graphics, cs.GR

Abstract: We present a real-time approach for multi-person 3D motion capture at over 30
fps using a single RGB camera. It operates in generic scenes and is robust to
difficult occlusions both by other people and objects. Our method operates in
subsequent stages. The first stage is a convolutional neural network (CNN) that
estimates 2D and 3D pose features along with identity assignments for all
visible joints of all individuals. We contribute a new architecture for this
CNN, called SelecSLS Net, that uses novel selective long and short range skip
connections to improve the information flow allowing for a drastically faster
network without compromising accuracy. In the second stage, a fully-connected
neural network turns the possibly partial (on account of occlusion) 2D pose and
3D pose features for each subject into a complete 3D pose estimate per
individual. The third stage applies space-time skeletal model fitting to the
predicted 2D and 3D pose per subject to further reconcile the 2D and 3D pose,
and enforce temporal coherence. Our method returns the full skeletal pose in
joint angles for each subject. This is a further key distinction from previous
work that neither extracted global body positions nor joint angle results of a
coherent skeleton in real time for multi-person scenes. The proposed system
runs on consumer hardware at a previously unseen speed of more than 30 fps
given 512x320 images as input while achieving state-of-the-art accuracy, which
we will demonstrate on a range of challenging real-world scenes.

Details

show

hide

Language(s): eng - English

Dates: Created: 2019-07-01Published Online: 2019

Publication Status: Published online

Pages: 18 p.

Publishing info: -

Table of Contents: -

Rev. Type: -

Identifiers: arXiv: 1907.00837
URI: http://arxiv.org/abs/1907.00837
BibTex Citekey: Mehta_arXiv1907.00837

Degree: -

Event

show

Legal Case

show

Project information

show

Source

show