# Item

ITEM ACTIONSEXPORT

Released

Conference Paper

#### Reinforcement Learning for Operational Space Control

##### MPS-Authors

There are no MPG-Authors in the publication available

##### External Resource

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4209397

(Publisher version)

##### Fulltext (restricted access)

There are currently no full texts shared for your IP range.

##### Fulltext (public)

There are no public fulltexts stored in PuRe

##### Supplementary Material (public)

There is no public supplementary material available

##### Citation

Peters, J., & Schaal, S. (2007). Reinforcement Learning for Operational Space Control.
In *2007 IEEE International Conference on Robotics and Automation* (pp. 2111-2116). Los Alamitos,
CA, USA: IEEE Computer Society.

Cite as: https://hdl.handle.net/11858/00-001M-0000-0013-CE27-C

##### Abstract

While operational space control is of essential importance

for robotics and well-understood from an analytical

point of view, it can be prohibitively hard to achieve accurate

control in face of modeling errors, which are inevitable in

complex robots, e.g., humanoid robots. In such cases, learning

control methods can offer an interesting alternative to analytical

control algorithms. However, the resulting supervised learning

problem is ill-defined as it requires to learn an inverse mapping

of a usually redundant system, which is well known to suffer

from the property of non-convexity of the solution space, i.e.,

the learning system could generate motor commands that try

to steer the robot into physically impossible configurations. The

important insight that many operational space control algorithms

can be reformulated as optimal control problems, however, allows

addressing this inverse learning problem in the framework of

reinforcement learning. However, few of the known optimization

or reinforcement learning algorithms can be used in online

learning control for robots, as they are either prohibitively

slow, do not scale to interesting domains of complex robots,

or require trying out policies generated by random search,

which are infeasible for a physical system. Using a generalization

of the EM-based reinforcement learning framework suggested

by Dayan amp; Hinton, we reduce the problem of learning with

immediate rewards to a reward-weighted regression problem

with an adaptive, integrated reward transformation for faster

convergence. The resulting algorithm is efficient, learns smoothly

without dangerous jumps in solution space, and works well in

applications of complex high degree-of-freedom robots.

for robotics and well-understood from an analytical

point of view, it can be prohibitively hard to achieve accurate

control in face of modeling errors, which are inevitable in

complex robots, e.g., humanoid robots. In such cases, learning

control methods can offer an interesting alternative to analytical

control algorithms. However, the resulting supervised learning

problem is ill-defined as it requires to learn an inverse mapping

of a usually redundant system, which is well known to suffer

from the property of non-convexity of the solution space, i.e.,

the learning system could generate motor commands that try

to steer the robot into physically impossible configurations. The

important insight that many operational space control algorithms

can be reformulated as optimal control problems, however, allows

addressing this inverse learning problem in the framework of

reinforcement learning. However, few of the known optimization

or reinforcement learning algorithms can be used in online

learning control for robots, as they are either prohibitively

slow, do not scale to interesting domains of complex robots,

or require trying out policies generated by random search,

which are infeasible for a physical system. Using a generalization

of the EM-based reinforcement learning framework suggested

by Dayan amp; Hinton, we reduce the problem of learning with

immediate rewards to a reward-weighted regression problem

with an adaptive, integrated reward transformation for faster

convergence. The resulting algorithm is efficient, learns smoothly

without dangerous jumps in solution space, and works well in

applications of complex high degree-of-freedom robots.