visualizers.viser_app.controllers.sampling.cross_entropy_method#

Module Contents#

class visualizers.viser_app.controllers.sampling.cross_entropy_method.CrossEntropyConfig#

Bases: jacta.visualizers.viser_app.controllers.sampling_base.SamplingBaseConfig

Configuration for cross-entropy method. .. py:attribute:: sigma_min

type:

float

value:

0.1

sigma_max: float = 1.0#
num_elites: int = 2#
horizon: float = 2.8#
num_rollouts: int = 32#
noise_ramp: float = 2.5#
use_noise_ramp: bool = True#
class visualizers.viser_app.controllers.sampling.cross_entropy_method.CrossEntropyMethod(task: jacta.visualizers.viser_app.tasks.task.Task, config: CrossEntropyConfig, reward_config: jacta.visualizers.viser_app.tasks.task.TaskConfig)#

Bases: jacta.visualizers.viser_app.controllers.sampling_base.SamplingBase

The cross-entropy method.

Parameters:
  • config (CrossEntropyConfig) – configuration object with hyperparameters for planner.

  • model – mujoco model of system being controlled.

  • data – current configuration data for mujoco model.

  • reward_func – function mapping batches of states/controls to batches of rewards.

  • task (jacta.visualizers.viser_app.tasks.task.Task) –

  • reward_config (jacta.visualizers.viser_app.tasks.task.TaskConfig) –

update_action(curr_state: numpy.ndarray, curr_time: float, additional_info: dict[str, Any]) None#

Performs rollouts + reward computation from current state.

Parameters:
  • curr_state (numpy.ndarray) –

  • curr_time (float) –

  • additional_info (dict[str, Any]) –

Return type:

None

action(time: float) numpy.ndarray#

Current best action of policy.

Parameters:

time (float) –

Return type:

numpy.ndarray