visualizers.viser_app.tasks.cartpole
#
Module Contents#
- visualizers.viser_app.tasks.cartpole.XML_PATH#
- class visualizers.viser_app.tasks.cartpole.CartpoleConfig#
Bases:
jacta.visualizers.viser_app.tasks.task.TaskConfig
Reward configuration for the cartpole task. .. py:attribute:: default_command
- type:
Optional[numpy.ndarray]
- w_vertical: float = 10.0#
- w_centered: float = 10.0#
- w_velocity: float = 0.1#
- w_control: float = 0.1#
- p_vertical: float = 0.01#
- p_centered: float = 0.1#
- cutoff_time: float = 0.15#
- class visualizers.viser_app.tasks.cartpole.Cartpole#
Bases:
jacta.visualizers.viser_app.tasks.mujoco_task.MujocoTask
[CartpoleConfig
]Defines the cartpole balancing task. .. py:method:: reward(states: numpy.ndarray, sensors: numpy.ndarray, controls: numpy.ndarray, config: CartpoleConfig, additional_info: dict[str, Any]) -> numpy.ndarray
Implements the cartpole reward from MJPC.
Maps a list of states, list of controls, to a batch of rewards (summed over time) for each rollout.
The cartpole reward has four terms:
* `vertical_rew`, penalizing the distance between the pole angle and vertical. * `centered_rew`, penalizing the distance from the cart to the origin. * `velocity_rew` penalizing squared linear and angular velocity. * `control_rew` penalizing any actuation.
Since we return rewards, each penalty term is returned as negative. The max reward is zero.
- returns:
A list of rewards shaped (batch_size,) where reward at index i represents the reward for that batched traj
- reset() None #
Resets the model to a default (random) state.
- Return type:
None
- is_terminated(config: CartpoleConfig) bool #
Termination condition for cartpole. End if position / velocity are small enough.
- Parameters:
config (CartpoleConfig) –
- Return type:
bool