learning.learner
#
Learner
module encapsulating the Deep Deterministic Policy Gradient (DDPG) algorithm.
Learner
initializes the actor, critic, and normalizers and takes care of checkpoints
during training as well as network loading if starting from pre-trained networks.
Module Contents#
- class learning.learner.Learner(plant: jacta.planner.dynamics.simulator_plant.SimulatorPlant, graph: jacta.planner.core.graph.Graph, replay_buffer: jacta.learning.replay_buffer.ReplayBuffer, params: jacta.planner.core.parameter_container.ParameterContainer, save_local: bool = True, load_local: bool = False, verbose: bool = True)#
Deep Deterministic Policy Gradient algorithm class.
Uses a state/goal normalizer and the HER sampling method to solve sparse reward environments. .. py:method:: reset() -> None
- Parameters:
plant (jacta.planner.dynamics.simulator_plant.SimulatorPlant) –
graph (jacta.planner.core.graph.Graph) –
replay_buffer (jacta.learning.replay_buffer.ReplayBuffer) –
params (jacta.planner.core.parameter_container.ParameterContainer) –
save_local (bool) –
load_local (bool) –
verbose (bool) –
- actor_actions(actor: jacta.learning.networks.Actor, node_ids: torch.IntTensor, action_time_step: float) torch.FloatTensor #
- Parameters:
actor (jacta.learning.networks.Actor) –
node_ids (torch.IntTensor) –
action_time_step (float) –
- Return type:
torch.FloatTensor
- relative_distances_to(data_container: jacta.planner.core.graph.Graph | jacta.learning.replay_buffer.ReplayBuffer, ids: torch.IntTensor, target_states: torch.FloatTensor) torch.FloatTensor #
- Parameters:
data_container (Union[jacta.planner.core.graph.Graph, jacta.learning.replay_buffer.ReplayBuffer]) –
ids (torch.IntTensor) –
target_states (torch.FloatTensor) –
- Return type:
torch.FloatTensor
- reward_function(data_container: jacta.planner.core.graph.Graph | jacta.learning.replay_buffer.ReplayBuffer, node_ids: torch.FloatTensor, goals: torch.FloatTensor) Tuple[torch.FloatTensor, torch.FloatTensor] #
- Parameters:
data_container (Union[jacta.planner.core.graph.Graph, jacta.learning.replay_buffer.ReplayBuffer]) –
node_ids (torch.FloatTensor) –
goals (torch.FloatTensor) –
- Return type:
Tuple[torch.FloatTensor, torch.FloatTensor]
- planner_exploration(root_states: torch.FloatTensor) torch.FloatTensor #
- Parameters:
root_states (torch.FloatTensor) –
- Return type:
torch.FloatTensor
- update_norm(states: torch.FloatTensor, goals: torch.FloatTensor) None #
Update the normalizers with the current episode of play experience.
Samples the trajectory instead of taking every experience to create a goal distribution that is equal to what the networks encouter.
- Parameters:
states (torch.FloatTensor) –
goals (torch.FloatTensor) –
- Return type:
None
- policy_rollout(temporary: bool = False) Tuple[torch.FloatTensor, bool] #
- Parameters:
temporary (bool) –
- Return type:
Tuple[torch.FloatTensor, bool]
- graph_rollout(temporary: bool = False) torch.FloatTensor #
- Parameters:
temporary (bool) –
- Return type:
torch.FloatTensor
- set_demonstration_injection(final_success_rate: float) None #
- Parameters:
final_success_rate (float) –
- Return type:
None
- train(num_epochs: int = 50) None #
Train a policy to solve the environment with DDPG.
Trajectories are resampled with HER to solve sparse reward environments.
- Parameters:
num_epochs (int) –
- Return type:
None
- state_action_training_data(num_trajectories: int = 1000, discount_factor: float = 0.98) Tuple[torch.FloatTensor, torch.FloatTensor, torch.FloatTensor, torch.FloatTensor, torch.FloatTensor] #
- Parameters:
num_trajectories (int) –
discount_factor (float) –
- Return type:
Tuple[torch.FloatTensor, torch.FloatTensor, torch.FloatTensor, torch.FloatTensor, torch.FloatTensor]
- pretrain(num_epochs: int = 100, num_trajectories: int = 1000, train_critic: bool = True) None #
- Parameters:
num_epochs (int) –
num_trajectories (int) –
train_critic (bool) –
- Return type:
None
- eval_agent() Tuple[float, float, float] #
Evaluate the current agent performance on the task.
Runs
learner_evals
times and averages the success rate.- Return type:
Tuple[float, float, float]
- save_models() None #
Save the actor and critic networks and the normalizers.
Saves are located under
/models/<model_filename>/
.- Return type:
None
- load_models(path: str) None #
Load the actor and critic networks and the normalizers.
- Parameters:
path (str) –
- Return type:
None