hide
Free keywords:
-
Abstract:
Autonomous robots that can assist humans in situations of daily life have been a long standing vision of robotics, artificial intelligence, and cognitive sciences. A first step towards this goal is to create robots that can learn tasks triggered by environmental context or higher level instruction. However, learning techniques have yet to live up to this promise as only few methods manage to scale
to high-dimensional manipulator or humanoid robots. In this talk, we investigate a general framework suitable for learning motor skills in robotics which is 3 based on the principles behind many analytical robotics approaches. It involves generating a representation of motor skills by parameterized motor primitive policies acting as building blocks of movement generation, and a learned task execution module that transforms these movements into motor commands.
Learning parameterized motor primitives usually requires reward-related self-improvement, i.e., reinforcement learning. We propose a new, task-appropriate architecture, the Natural Actor-Critic. This algorithm is based on natural policy gradient updates for the actor while the critic estimates the natural policy gradient. Empirical evaluations illustrate the effectiveness and applicability to learning control on an anthropomorphic robot arm.
For the proper execution of motion, we need to learn how to realize the behavior prescribed by the motor primitives in their respective task space through the generation of motor commands. This transformation corresponds to solving the classical problem of operational space control through machine learning techniques. Such robot control problems can be reformulated as immediate reward reinforcement learning problems. We derive an EM-based reinforcement
learning algorithm which reduces the problem of learning with immediate rewards to a reward-weighted regression problem. The resulting algorithm learns smoothly without dangerous jumps in solution space, and works well in application to complex high degree-of-freedom robots.