English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Paper

Learning a Category-level Object Pose Estimator without Pose Annotations

MPS-Authors
/persons/resource/persons283728

Kortylewski,  Adam       
Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)

arXiv:2404.05626.pdf
(Preprint), 6MB

Supplementary Material (public)
There is no public supplementary material available
Citation

Tian, F., Liu, Y., Kortylewski, A., Duan, Y., Du, S., Yuille, A., et al. (2024). Learning a Category-level Object Pose Estimator without Pose Annotations. Retrieved from https://arxiv.org/abs/2404.05626.


Cite as: https://hdl.handle.net/21.11116/0000-0010-2924-8
Abstract
3D object pose estimation is a challenging task. Previous works always
require thousands of object images with annotated poses for learning the 3D
pose correspondence, which is laborious and time-consuming for labeling. In
this paper, we propose to learn a category-level 3D object pose estimator
without pose annotations. Instead of using manually annotated images, we
leverage diffusion models (e.g., Zero-1-to-3) to generate a set of images under
controlled pose differences and propose to learn our object pose estimator with
those images. Directly using the original diffusion model leads to images with
noisy poses and artifacts. To tackle this issue, firstly, we exploit an image
encoder, which is learned from a specially designed contrastive pose learning,
to filter the unreasonable details and extract image feature maps.
Additionally, we propose a novel learning strategy that allows the model to
learn object poses from those generated image sets without knowing the
alignment of their canonical poses. Experimental results show that our method
has the capability of category-level object pose estimation from a single shot
setting (as pose definition), while significantly outperforming other
state-of-the-art methods on the few-shot category-level object pose estimation
benchmarks.