Learning Control and Planning from the View of Control Theory and Imitation

Peters, J

Item

ITEM ACTIONSEXPORT

Add to Basket

Please note that a newer version of this item is available:
https://pure.mpg.de/pubman/item/item_1792220_2

DetailsSummary

Released

Talk

Learning Control and Planning from the View of Control Theory and Imitation

MPS-Authors

/persons/resource/persons84135

Peters, J
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Dept. Empirical Inference, Max Planck Institute for Intelligent Systems, Max Planck Society;

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Peters, J. (2003). Learning Control and Planning from the View of Control Theory and Imitation. Talk presented at NIPS 2003 Workshop "Planning for the Real World: The promises and challenges of dealing with uncertainty". Whistler, BC, Canada.

Cite as: https://hdl.handle.net/11858/00-001M-0000-0013-DAB5-B

Abstract

Learning control and planning in high dimensional continuous state-action systems, e.g., as needed in a humanoid robot, has so far been a domain beyond the applicability of generic planning techniques like reinforcement learning and dynamic programming. This talk describes an approach we have taken in order to enable complex robotics systems to learn to accomplish control tasks. Adaptive learning controllers equipped with statistical learning techniques can be used to learn tracking controllers -- missing state information and uncertainty in the state estimates are usually addressed by observers or direct adaptive control methods. Imitation learning is used as an ingredient to seed initial control policies whose output is a desired trajectory suitable to accomplish the task at hand. Reinforcement learning with stochastic policy gradients using a natural gradient forms the third component that allows refining the initial control policy until the task is accomplished. In comparison to general learning control, this approach is highly prestructured and thus more domain specific. However, it seems to be a theoretically clean and feasible strategy for control systems of the complexity that we need to address.