learning.networks#
The actor module contains the actor class as well as the actor networks.
The Actor acts as a wrapper around the actual deterministic policy network to provide
action selection, and loading utilities.
DDP is a vanilla deep deterministic policy network implementation.
Module Contents#
- learning.networks.soft_update(network: torch.nn.Module, target: torch.nn.Module, tau: float) torch.nn.Module#
Perform a soft update of the target network’s weights.
Shifts the weights of the
targetby a factor oftauinto the direction of thenetwork.- Parameters:
network (torch.nn.Module) – Network from which to copy the weights.
target (torch.nn.Module) – Network that gets updated.
tau (float) – Controls how much the weights are shifted. Valid in [0, 1].
- Returns:
The updated target network.
- Return type:
torch.nn.Module
- class learning.networks.Actor(size_s: int, size_a: int, nlayers: int = 4, layer_width: int = 256, lr: float = 0.001, eps: float = 0.3, action_clip: float = 1.0)#
Actor class encapsulating the action selection and training process for the DDPG actor. .. py:method:: select_action(state_norm: jacta.learning.normalizer.Normalizer, state: torch.FloatTensor, goal: torch.FloatTensor, exploration_function: Optional[Callable] = None) -> torch.FloatTensor
Select an action for the given input states.
If in train mode, samples noise and chooses completely random actions with probability
self.eps. If in evaluation mode, only clips the action to the maximum value.- param states:
Input states.
- returns:
A numpy array of actions.
- Parameters:
size_s (int) –
size_a (int) –
nlayers (int) –
layer_width (int) –
lr (float) –
eps (float) –
action_clip (float) –
- __call__(states: torch.FloatTensor) torch.FloatTensor#
Run a forward pass directly on the action net.
- Parameters:
states (torch.FloatTensor) – Input states.
- Returns:
An action tensor.
- Return type:
torch.FloatTensor
- backward_step(loss: torch.FloatTensor) None#
Perform a backward pass with an optimizer step.
- Parameters:
loss (torch.FloatTensor) – Actor network loss.
- Return type:
None
- target(states: torch.FloatTensor) torch.FloatTensor#
Compute actions with the target network and without noise.
- Parameters:
states (torch.FloatTensor) – Input states.
- Returns:
An action tensor.
- Return type:
torch.FloatTensor
- eval() None#
Set the actor to eval mode without noise in the action selection.
- Return type:
None
- train() None#
Set the actor to train mode with noisy actions.
- Return type:
None
- update_target(tau: float = 0.05) None#
Update the target network with a soft parameter transfer update.
- Parameters:
tau (float) – Averaging fraction of the parameter update for the action network.
- Return type:
None
- load(checkpoint: Any) None#
Load data for the actor.
- Parameters:
checkpoint (Any) – dict containing loaded data.
- Return type:
None
- save(f: io.BufferedWriter) None#
Save data for the actor.
- Parameters:
f (io.BufferedWriter) –
- Return type:
None
- class learning.networks.DDP(size_s: int, size_a: int, nlayers: int, layer_width: int)#
Bases:
torch.nn.ModuleContinuous action choice network for the agent. .. py:method:: forward(x: torch.FloatTensor) -> torch.FloatTensor
Compute the network forward pass.
- param x:
Input tensor.
- returns:
The network output.
- Parameters:
size_s (int) –
size_a (int) –
nlayers (int) –
layer_width (int) –
- class learning.networks.Critic(size_s: int, size_a: int, nlayers: int = 4, layer_width: int = 256, lr: float = 0.001)#
Critic class encapsulating the critic and training process for the DDPG critic. .. py:method:: __call__(states: torch.FloatTensor, actions: torch.FloatTensor) -> torch.FloatTensor
Run a critic net forward pass.
- param states:
Input states.
- param actions:
Input actions.
- returns:
An action value tensor.
- Parameters:
size_s (int) –
size_a (int) –
nlayers (int) –
layer_width (int) –
lr (float) –
- target(states: torch.FloatTensor, actions: torch.FloatTensor) torch.FloatTensor#
Compute the action value with the target network.
- Parameters:
states (torch.FloatTensor) – Input states.
actions (torch.FloatTensor) – Input actions.
- Returns:
An action value tensor.
- Return type:
torch.FloatTensor
- backward_step(loss: torch.FloatTensor) None#
Perform a backward pass with an optimizer step.
- Parameters:
loss (torch.FloatTensor) – Critic network loss.
- Return type:
None
- update_target(tau: float = 0.05) None#
Update the target network with a soft parameter transfer update.
- Parameters:
tau (float) – Averaging fraction of the parameter update for the action network.
- Return type:
None
- load(checkpoint: Any) None#
Load data for the critic.
- Parameters:
checkpoint (Any) – dict containing loaded data.
- Return type:
None
- save(f: io.BufferedWriter) None#
Save data for the critic.
- Parameters:
f (io.BufferedWriter) –
- Return type:
None
- class learning.networks.CriticNetwork(size_s: int, size_a: int, nlayers: int, layer_width: int)#
Bases:
torch.nn.ModuleState action critic network for the critic. .. py:method:: forward(state: torch.FloatTensor, action: torch.FloatTensor) -> torch.FloatTensor
Compute the network forward pass.
- param state:
Input state tensor.
- param action:
Input action tensor.
- returns:
The network output.
- Parameters:
size_s (int) –
size_a (int) –
nlayers (int) –
layer_width (int) –
- learning.networks.train_actor_critic(actor: Actor, actor_expert: Actor, critic: Critic, state_norm: jacta.learning.normalizer.Normalizer, replay_buffer: jacta.learning.replay_buffer.ReplayBuffer, reward_fun: Callable, batch_size: int = 256, discount_factor: float = 0.98, her_probability: float = 0.0) float#
Train the agent and critic network with experience sampled from the replay buffer.
- learning.networks.train_actor_imitation(actor: Actor, state_norm: jacta.learning.normalizer.Normalizer, states: torch.FloatTensor, goals: torch.FloatTensor, actor_actions: torch.FloatTensor, batch_size: int = 256) float#
- Parameters:
actor (Actor) –
state_norm (jacta.learning.normalizer.Normalizer) –
states (torch.FloatTensor) –
goals (torch.FloatTensor) –
actor_actions (torch.FloatTensor) –
batch_size (int) –
- Return type:
float
- learning.networks.train_critic_imitation(critic: Critic, state_norm: jacta.learning.normalizer.Normalizer, states: torch.FloatTensor, goals: torch.FloatTensor, actor_actions: torch.FloatTensor, q_values: torch.FloatTensor, batch_size: int = 256) float#
- Parameters:
critic (Critic) –
state_norm (jacta.learning.normalizer.Normalizer) –
states (torch.FloatTensor) –
goals (torch.FloatTensor) –
actor_actions (torch.FloatTensor) –
q_values (torch.FloatTensor) –
batch_size (int) –
- Return type:
float