`visualizers.viser_app.tasks.cartpole`#

Module Contents#

visualizers.viser_app.tasks.cartpole.XML_PATH#

class visualizers.viser_app.tasks.cartpole.CartpoleConfig#

Bases: jacta.visualizers.viser_app.tasks.task.TaskConfig

Reward configuration for the cartpole task. .. py:attribute:: default_command

type:

Optional[numpy.ndarray]

w_vertical: float = 10.0#

w_centered: float = 10.0#

w_velocity: float = 0.1#

w_control: float = 0.1#

p_vertical: float = 0.01#

p_centered: float = 0.1#

cutoff_time: float = 0.15#

class visualizers.viser_app.tasks.cartpole.Cartpole#

Bases: jacta.visualizers.viser_app.tasks.mujoco_task.MujocoTask[CartpoleConfig]

Defines the cartpole balancing task. .. py:method:: reward(states: numpy.ndarray, sensors: numpy.ndarray, controls: numpy.ndarray, config: CartpoleConfig, additional_info: dict[str, Any]) -> numpy.ndarray

Implements the cartpole reward from MJPC.

Maps a list of states, list of controls, to a batch of rewards (summed over time) for each rollout.

The cartpole reward has four terms:
* `vertical_rew`, penalizing the distance between the pole angle and vertical.
* `centered_rew`, penalizing the distance from the cart to the origin.
* `velocity_rew` penalizing squared linear and angular velocity.
* `control_rew` penalizing any actuation.
Since we return rewards, each penalty term is returned as negative. The max reward is zero.

returns:

A list of rewards shaped (batch_size,) where reward at index i represents the reward for that batched traj

reset() → None#

Resets the model to a default (random) state.

Return type:: None

is_terminated(config: CartpoleConfig) → bool#

Termination condition for cartpole. End if position / velocity are small enough.

Parameters:: config (CartpoleConfig) –
Return type:: bool

visualizers.viser_app.tasks.cartpole#

Module Contents#

`visualizers.viser_app.tasks.cartpole`#