English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  XNect: Real-time Multi-person 3D Human Pose Estimation with a Single RGB Camera

Mehta, D., Sotnychenko, O., Mueller, F., Xu, W., Elgharib, M., Fua, P., et al. (2019). XNect: Real-time Multi-person 3D Human Pose Estimation with a Single RGB Camera. Retrieved from http://arxiv.org/abs/1907.00837.

Item is

Files

show Files
hide Files
:
arXiv:1907.00837.pdf (Preprint), 10MB
Name:
arXiv:1907.00837.pdf
Description:
File downloaded from arXiv at 2019-07-09 10:40
OA-Status:
Visibility:
Public
MIME-Type / Checksum:
application/pdf / [MD5]
Technical Metadata:
Copyright Date:
-
Copyright Info:
-

Locators

show

Creators

show
hide
 Creators:
Mehta, Dushyant1, Author           
Sotnychenko, Oleksandr1, Author           
Mueller, Franziska1, Author           
Xu, Weipeng1, Author           
Elgharib, Mohamed1, Author           
Fua, Pascal2, Author
Seidel, Hans-Peter1, Author                 
Rhodin, Helge2, Author           
Pons-Moll, Gerard3, Author                 
Theobalt, Christian1, Author                 
Affiliations:
1Computer Graphics, MPI for Informatics, Max Planck Society, ou_40047              
2External Organizations, ou_persistent22              
3Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society, ou_1116547              

Content

show
hide
Free keywords: Computer Science, Computer Vision and Pattern Recognition, cs.CV,Computer Science, Graphics, cs.GR
 Abstract: We present a real-time approach for multi-person 3D motion capture at over 30
fps using a single RGB camera. It operates in generic scenes and is robust to
difficult occlusions both by other people and objects. Our method operates in
subsequent stages. The first stage is a convolutional neural network (CNN) that
estimates 2D and 3D pose features along with identity assignments for all
visible joints of all individuals. We contribute a new architecture for this
CNN, called SelecSLS Net, that uses novel selective long and short range skip
connections to improve the information flow allowing for a drastically faster
network without compromising accuracy. In the second stage, a fully-connected
neural network turns the possibly partial (on account of occlusion) 2D pose and
3D pose features for each subject into a complete 3D pose estimate per
individual. The third stage applies space-time skeletal model fitting to the
predicted 2D and 3D pose per subject to further reconcile the 2D and 3D pose,
and enforce temporal coherence. Our method returns the full skeletal pose in
joint angles for each subject. This is a further key distinction from previous
work that neither extracted global body positions nor joint angle results of a
coherent skeleton in real time for multi-person scenes. The proposed system
runs on consumer hardware at a previously unseen speed of more than 30 fps
given 512x320 images as input while achieving state-of-the-art accuracy, which
we will demonstrate on a range of challenging real-world scenes.

Details

show
hide
Language(s): eng - English
 Dates: 2019-07-012019
 Publication Status: Published online
 Pages: 18 p.
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: arXiv: 1907.00837
URI: http://arxiv.org/abs/1907.00837
BibTex Citekey: Mehta_arXiv1907.00837
 Degree: -

Event

show

Legal Case

show

Project information

show

Source

show