In the Wild Human Pose Estimation Using Explicit 2D Features and Intermediate 
3D Representations

Habibie, Ikhsanul; Xu, Weipeng; Mehta, Dushyant; Pons-Moll, Gerard; Theobalt, Christian

アイテム詳細

登録内容を編集ファイル形式で保存

一時保存へ追加

タグ情報を表示リリース履歴を表示詳細要約

公開

成果報告書

In the Wild Human Pose Estimation Using Explicit 2D Features and Intermediate 3D Representations

MPS-Authors

/persons/resource/persons239614

Habibie, Ikhsanul
Computer Graphics, MPI for Informatics, Max Planck Society;

/persons/resource/persons206382

Xu, Weipeng
Computer Graphics, MPI for Informatics, Max Planck Society;

/persons/resource/persons129023

Mehta, Dushyant
Computer Graphics, MPI for Informatics, Max Planck Society;

/persons/resource/persons118756

Pons-Moll, Gerard
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society;

/persons/resource/persons45610

Theobalt, Christian
Computer Graphics, MPI for Informatics, Max Planck Society;

External Resource

There are no locators available

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

フルテキスト (公開)

arXiv:1904.03289.pdf
(プレプリント), 4MB

付随資料 (公開)

There is no public supplementary material available

引用

Habibie, I., Xu, W., Mehta, D., Pons-Moll, G., & Theobalt, C. (2019). In the Wild Human Pose Estimation Using Explicit 2D Features and Intermediate 3D Representations. Retrieved from http://arxiv.org/abs/1904.03289.

引用: https://hdl.handle.net/21.11116/0000-0003-F76E-C

要旨

Convolutional Neural Network based approaches for monocular 3D human pose
estimation usually require a large amount of training images with 3D pose
annotations. While it is feasible to provide 2D joint annotations for large
corpora of in-the-wild images with humans, providing accurate 3D annotations to
such in-the-wild corpora is hardly feasible in practice. Most existing 3D
labelled data sets are either synthetically created or feature in-studio
images. 3D pose estimation algorithms trained on such data often have limited
ability to generalize to real world scene diversity. We therefore propose a new
deep learning based method for monocular 3D human pose estimation that shows
high accuracy and generalizes better to in-the-wild scenes. It has a network
architecture that comprises a new disentangled hidden space encoding of
explicit 2D and 3D features, and uses supervision by a new learned projection
model from predicted 3D pose. Our algorithm can be jointly trained on image
data with 3D labels and image data with only 2D labels. It achieves
state-of-the-art accuracy on challenging in-the-wild data.