Learning Visual Representations for Perception-Action Systems

Piater, J; Jodogne, S; Detry, R; Kraft, R; Krüger, N; Kroemer, O.; Peters, J

doi:10.1177/0278364910382464

アイテム詳細

登録内容を編集ファイル形式で保存

一時保存へ追加

タグ情報を表示リリース履歴を表示詳細要約

公開

学術論文

Learning Visual Representations for Perception-Action Systems

MPS-Authors

/persons/resource/persons84027

Kroemer, O.
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

/persons/resource/persons84135

Peters, J
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

External Resource

http://journals.sagepub.com/doi/pdf/10.1177/0278364910382464
(出版社版)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

フルテキスト (公開)

公開されているフルテキストはありません

付随資料 (公開)

There is no public supplementary material available

引用

Piater, J., Jodogne, S., Detry, R., Kraft, R., Krüger, N., Kroemer, O., & Peters, J. (2011). Learning Visual Representations for Perception-Action Systems. The International Journal of Robotics Research, 30(3), 294-307. doi:10.1177/0278364910382464.

引用: https://hdl.handle.net/11858/00-001M-0000-0013-BC94-5

要旨

We discuss vision as a sensory modality for systems that interact flexibly with uncontrolled environments. Instead of trying to build a generic vision system that produces task-independent representations, we argue in favor of task-specific, learnable representations. This concept is illustrated by two examples of our own work. First, our RLVC algorithm performs reinforcement learning directly on the visual input space. To make this very large space manageable, RLVC interleaves the reinforcement learner with a supervised classification algorithm that seeks to split perceptual states so as to reduce perceptual aliasing. This results in an adaptive discretization of the perceptual space based on the presence or absence of visual features. Its extension, RLJC, additionally handles continuous action spaces. In contrast to the minimalistic visual representations produced by RLVC and RLJC, our second method learns structural object models for robust object detection and pose estimation by probabilistic inference. To these models, the method associates grasp experiences autonomously learned by trial and error. These experiences form a non-parametric representation of grasp success likelihoods over gripper poses, which we call a grasp density. Thus, object detection in a novel scene simultaneously produces suitable grasping options.