learning.networks
#
The actor
module contains the actor class as well as the actor networks.
The Actor
acts as a wrapper around the actual deterministic policy network to provide
action selection, and loading utilities.
DDP
is a vanilla deep deterministic policy network implementation.
Module Contents#
- learning.networks.soft_update(network: torch.nn.Module, target: torch.nn.Module, tau: float) torch.nn.Module #
Perform a soft update of the target network’s weights.
Shifts the weights of the
target
by a factor oftau
into the direction of thenetwork
.- Parameters:
network (torch.nn.Module) – Network from which to copy the weights.
target (torch.nn.Module) – Network that gets updated.
tau (float) – Controls how much the weights are shifted. Valid in [0, 1].
- Returns:
The updated target network.
- Return type:
torch.nn.Module
- class learning.networks.Actor(size_s: int, size_a: int, nlayers: int = 4, layer_width: int = 256, lr: float = 0.001, eps: float = 0.3, action_clip: float = 1.0)#
Actor class encapsulating the action selection and training process for the DDPG actor. .. py:method:: select_action(state_norm: jacta.learning.normalizer.Normalizer, state: torch.FloatTensor, goal: torch.FloatTensor, exploration_function: Optional[Callable] = None) -> torch.FloatTensor
Select an action for the given input states.
If in train mode, samples noise and chooses completely random actions with probability
self.eps
. If in evaluation mode, only clips the action to the maximum value.- param states:
Input states.
- returns:
A numpy array of actions.
- Parameters:
size_s (int) –
size_a (int) –
nlayers (int) –
layer_width (int) –
lr (float) –
eps (float) –
action_clip (float) –
- __call__(states: torch.FloatTensor) torch.FloatTensor #
Run a forward pass directly on the action net.
- Parameters:
states (torch.FloatTensor) – Input states.
- Returns:
An action tensor.
- Return type:
torch.FloatTensor
- backward_step(loss: torch.FloatTensor) None #
Perform a backward pass with an optimizer step.
- Parameters:
loss (torch.FloatTensor) – Actor network loss.
- Return type:
None
- target(states: torch.FloatTensor) torch.FloatTensor #
Compute actions with the target network and without noise.
- Parameters:
states (torch.FloatTensor) – Input states.
- Returns:
An action tensor.
- Return type:
torch.FloatTensor
- eval() None #
Set the actor to eval mode without noise in the action selection.
- Return type:
None
- train() None #
Set the actor to train mode with noisy actions.
- Return type:
None
- update_target(tau: float = 0.05) None #
Update the target network with a soft parameter transfer update.
- Parameters:
tau (float) – Averaging fraction of the parameter update for the action network.
- Return type:
None
- load(checkpoint: Any) None #
Load data for the actor.
- Parameters:
checkpoint (Any) – dict containing loaded data.
- Return type:
None
- save(f: io.BufferedWriter) None #
Save data for the actor.
- Parameters:
f (io.BufferedWriter) –
- Return type:
None
- class learning.networks.DDP(size_s: int, size_a: int, nlayers: int, layer_width: int)#
Bases:
torch.nn.Module
Continuous action choice network for the agent. .. py:method:: forward(x: torch.FloatTensor) -> torch.FloatTensor
Compute the network forward pass.
- param x:
Input tensor.
- returns:
The network output.
- Parameters:
size_s (int) –
size_a (int) –
nlayers (int) –
layer_width (int) –
- class learning.networks.Critic(size_s: int, size_a: int, nlayers: int = 4, layer_width: int = 256, lr: float = 0.001)#
Critic class encapsulating the critic and training process for the DDPG critic. .. py:method:: __call__(states: torch.FloatTensor, actions: torch.FloatTensor) -> torch.FloatTensor
Run a critic net forward pass.
- param states:
Input states.
- param actions:
Input actions.
- returns:
An action value tensor.
- Parameters:
size_s (int) –
size_a (int) –
nlayers (int) –
layer_width (int) –
lr (float) –
- target(states: torch.FloatTensor, actions: torch.FloatTensor) torch.FloatTensor #
Compute the action value with the target network.
- Parameters:
states (torch.FloatTensor) – Input states.
actions (torch.FloatTensor) – Input actions.
- Returns:
An action value tensor.
- Return type:
torch.FloatTensor
- backward_step(loss: torch.FloatTensor) None #
Perform a backward pass with an optimizer step.
- Parameters:
loss (torch.FloatTensor) – Critic network loss.
- Return type:
None
- update_target(tau: float = 0.05) None #
Update the target network with a soft parameter transfer update.
- Parameters:
tau (float) – Averaging fraction of the parameter update for the action network.
- Return type:
None
- load(checkpoint: Any) None #
Load data for the critic.
- Parameters:
checkpoint (Any) – dict containing loaded data.
- Return type:
None
- save(f: io.BufferedWriter) None #
Save data for the critic.
- Parameters:
f (io.BufferedWriter) –
- Return type:
None
- class learning.networks.CriticNetwork(size_s: int, size_a: int, nlayers: int, layer_width: int)#
Bases:
torch.nn.Module
State action critic network for the critic. .. py:method:: forward(state: torch.FloatTensor, action: torch.FloatTensor) -> torch.FloatTensor
Compute the network forward pass.
- param state:
Input state tensor.
- param action:
Input action tensor.
- returns:
The network output.
- Parameters:
size_s (int) –
size_a (int) –
nlayers (int) –
layer_width (int) –
- learning.networks.train_actor_critic(actor: Actor, actor_expert: Actor, critic: Critic, state_norm: jacta.learning.normalizer.Normalizer, replay_buffer: jacta.learning.replay_buffer.ReplayBuffer, reward_fun: Callable, batch_size: int = 256, discount_factor: float = 0.98, her_probability: float = 0.0) float #
Train the agent and critic network with experience sampled from the replay buffer.
- learning.networks.train_actor_imitation(actor: Actor, state_norm: jacta.learning.normalizer.Normalizer, states: torch.FloatTensor, goals: torch.FloatTensor, actor_actions: torch.FloatTensor, batch_size: int = 256) float #
- Parameters:
actor (Actor) –
state_norm (jacta.learning.normalizer.Normalizer) –
states (torch.FloatTensor) –
goals (torch.FloatTensor) –
actor_actions (torch.FloatTensor) –
batch_size (int) –
- Return type:
float
- learning.networks.train_critic_imitation(critic: Critic, state_norm: jacta.learning.normalizer.Normalizer, states: torch.FloatTensor, goals: torch.FloatTensor, actor_actions: torch.FloatTensor, q_values: torch.FloatTensor, batch_size: int = 256) float #
- Parameters:
critic (Critic) –
state_norm (jacta.learning.normalizer.Normalizer) –
states (torch.FloatTensor) –
goals (torch.FloatTensor) –
actor_actions (torch.FloatTensor) –
q_values (torch.FloatTensor) –
batch_size (int) –
- Return type:
float