Nonsmooth Control

Description & Equations

The environment used in the this problem is a second-order system with distinct poles defined by the the following:

\begin{align} \nonumber\dfrac{d}{dt} \begin{pmatrix} x_1 \\ x_2 \ \end{pmatrix} = \begin{bmatrix} 0 & 1 \\ -2 & -3 \end{bmatrix} \begin{pmatrix} x_1 \\ x_2 \ \end{pmatrix} + \begin{pmatrix} 0 \\ 1 \ \end{pmatrix} u \end{align}

where $x$ is the state variable and $u$ is the action variable.

Observation

The observation of the Nonsmooth Control environment provides information on the state variables and their associated setpoints (if they exist) at the current timestep. The observation is an array of shape (1, 2). Therefore, the observation is [x_1, x_sp].

Action

The action space is a ContinuousBox of [-1,1].

Reward

The reward is a continuous value corresponding to square error of the state and its setpoint. For multiple states, these are scaled with a factor r_scale and summed to give a single value. The goal of this environment is to drive the $x_1$ state to the origin.

Reference

The original model was created by Lim (1969).