Optimal Control for One-link pendulum swing-up

Description
Trajectory Optimization
Controller Design
Controller Indexed by Time
Description:
Synthesis:
Result:
Controller Indexed by State
Description:
Synthesis:

Description

One-link pendulum swing-up
One-link pendulum swing-up
In one link pendulum swing-up a motor at the base of the pendulum swings a rigid arm from the downward stable equilibrium to the upright unstable equilibrium and balances the arm there. What makes this challenging is that the one step cost function penalizes the amount of torque used and the deviation of the current position from the goal. The controller must try to minimize the total cost of the trajectory. The one step cost function for this example is a weighted sum of the squared position errors (difference between current angles and the goal angles) and the squared torques, latex2png equation, where 0.1 weights the position error relative to the torque penalty, and T is the time step of the simulation (0.01s). There are no costs associated with the joint velocity.

Trajectory Optimization

Trajectory optimization in ampl. Code here
Trajectory optimization by Matlab.Code here

Controller Design

Controller Indexed by Time

Description:

Controller indexed by time takes the form
latex2png equation
where latex2png equation is the driving torque, latex2png equation and latex2png equation is the optimal trajectory, and latex2png equation is the feedforward optimal torque.

Synthesis:

Controller's structure
Controller's structure

Result:

Dynamic simulation where p=0 and v=0

Controller Indexed by State

Description:

The controller takes the form
latex2png equation
where latex2png equation is the driving torque, latex2png equation and latex2png equation the current position and velocity. In order to get optimal control policy, we generate optimal trajectories from a grid of starting points and use the first latex2png equation as the optimal control for the state at the starting point. Each trajectory is locally optimized using SNOPT. Information is exchanged between trajectories to enable convergence to globally optimal trajectories 1.

Synthesis:

Optimal policy
Optimal policy
Value function
Value function
Optimal trajectory and policy
Optimal trajectory and policy

1. Atkeson, C.G.; Stephens, B.J., "Random Sampling of States in Dynamic Programming," Systems, Man, and Cybernetics, Part B, IEEE Transactions on , vol.38, no.4, pp.924-929, Aug. 2008