TorchFort Configuration Files¶
The TorchFort library relies on a user-defined YAML configuration file to define several aspects of the training procedure, with specific blocks to control:
general properties
model properties
optimizer properties
loss function properties
learning rate schedule properties
The following sections define each configuration block and available options.
Common¶
The following sections list configuration file blocks common to supervised learning and reinforcement learning configuration files.
General Properties¶
The block in the configuration file defining general properties takes the following structure:
general:
<option> = <value>
The following table lists the available options:
Option |
Data Type |
Description |
---|---|---|
|
integer |
frequency of reported TorchFort training/validation output lines to terminal (default = |
|
boolean |
flag to control whether wandb hook is active (default = |
|
boolean |
flag to control verbose output from TorchFort (default = |
For more information about the wandb hook, see Weights and Biases Support.
Optimizer Properties¶
The block in the configuration file defining optimizer properties takes the following structure:
optimizer:
type: <optimizer_type>
parameters:
<option> = <value>
The following table lists the available optimizer types:
Optimizer Type |
Description |
---|---|
|
Stochastic Gradient Descent optimizer |
|
ADAM optimizer |
The following table lists the available options by optimizer type:
Optimizer Type |
Option |
Data Type |
Description |
---|---|---|---|
|
|
float |
learning rate (default = |
|
float |
mometum factor (default = |
|
|
float |
dampening for momentum (default = |
|
|
float |
weight decay/L2 penalty (default = |
|
|
boolean |
enables Nesterov momentum (default = |
|
|
|
float |
learning rate (default = |
|
float |
coefficient used for computing running average of gradient (default = |
|
|
float |
coefficient use for computing running average of square of gradient (default = |
|
|
float |
weight decay/L2 penalty (default = |
|
|
float |
term added to denominator to improve numerical stability (default = |
|
|
boolean |
whether to use AMSGrad variant (default = |
Learning Rate Schedule Properties¶
The block in the configuration file defining learning rate schedule properties takes the following structure:
lr_scheduler:
type: <schedule_type>
parameters:
<option> = <value>
The following table lists the available schedule types:
Schedule Type |
Description |
---|---|
|
Decays learning rate by multiplicative factor every |
|
Decays learning rate by multiplicative factor at user-defined training iteration milestones |
|
Decays learning rate by polynomial function |
|
Decays learning rate using cosine annealing schedule. See PyTorch documentation of torch.optim.lr_scheduler.CosineAnnealingLR for more details. |
The following table lists the available options by schedule type:
Schedule Type |
Option |
Data Type |
Description |
---|---|---|---|
|
|
integer |
Number of training steps between learning rate decay |
|
float |
Multiplicative factor of learning rate decay (default = |
|
|
|
list of integer |
Training step milestones for learning rate decay |
|
float |
Multiplicative factor of learning rate decay (default = |
|
|
|
integer |
Number of training iterations to decay the learning rate |
|
float |
The power of the polynomial (default = |
|
|
|
float |
Minumum learning rate (default = |
|
float |
Maximum number of iterations for decay |
Supervised Learning¶
The following sections list configuration file blocks specific to supervised learning configuration files.
Model Properties¶
The block in the configuration file defining model properties takes the following structure:
model:
type: <model_type>
parameters:
<option> = <value>
The following table lists the available model types:
Model Type |
Description |
---|---|
|
Load a model from an exported TorchScript file |
|
Use built-in MLP model |
The following table lists the available options by model type:
Model Type |
Option |
Data Type |
Description |
---|---|---|---|
|
|
string |
path to TorchScript exported model file |
|
|
list of integers |
sequence of input/output sizes for linear layers e.g., |
|
float |
probability of an element to be zeroed in dropout layers (default = |
Loss Properties¶
The block in the configuration file defining loss properties takes the following structure:
loss:
type: <loss_type>
parameters:
<option> = <value>
The following table lists the available loss types:
Loss Type |
Description |
---|---|
|
L1/Mean Average Error |
|
Mean Squared Error |
The following table lists the available options by loss type:
Loss Type |
Option |
Data Type |
Description |
---|---|---|---|
|
|
string |
Specifies type of reduction to apply to output. Can be either |
|
|
string |
Specifies type of reduction to apply to output. Can be either |
Reinforcement Learning¶
The following sections list configuration file blocks specific to reinforcement learning system configuration files.
Reinforcement Learning Training Algorithm Properties¶
The block in the configuration file defining algorithm properties takes the following structure:
algorithm:
type: <algorithm_type>
parameters:
<option> = <value>
The following table lists the available algorithm types:
Algorithm Type |
Description |
---|---|
|
Deterministic Policy Gradient. See DDPG documentation by OpenAI for details |
|
Twin Delayed DDPG. See TD3 documentation by OpenAI for details |
|
Soft Actor Critic. See SAC documentation by OpenAI for details |
The following table lists the available options by algorithm type:
Algorithm Type |
Option |
Data Type |
Description |
---|---|---|---|
|
|
integer |
batch size used in training |
|
integer |
number of steps for N-step training |
|
|
string |
reduction mode for N-step training (see below) |
|
|
float |
discount factor |
|
|
boolean |
weight average factor for target weights (in some frameworks called rho = 1-tau) |
|
|
|
integer |
batch size used in training |
|
integer |
number of steps for N-step training |
|
|
string |
reduction mode for N-step training (see below) |
|
|
float |
discount factor |
|
|
float |
weight average factor for target weights (in some frameworks called rho = 1-tau) |
|
|
integer |
number of critic networks used |
|
|
integer |
update frequency for the policy in units of critic updates |
|
|
|
integer |
batch size used in training |
|
integer |
number of steps for N-step training |
|
|
string |
reduction mode for N-step training (see below) |
|
|
float |
discount factor |
|
|
float |
entropy regularization coefficient |
|
|
boolean |
weight average factor for target weights (in some frameworks called rho = 1-tau) |
|
|
integer |
update frequency for the policy in units of value updates |
The parameter nstep_reward_reduction
defines how the reward is accumulated over N-step rollouts. The options are summarized in a table below (\(N\) is the value from parameter nstep
described above):
Reduction Mode |
Description |
---|---|
|
\(r = \sum_{i=1}^{N^\ast} \gamma^{i-1} r_i\) |
|
\(r = \sum_{i=1}^{N^\ast} \gamma^{i-1} r_i / N^\ast\) |
|
\(r = \sum_{i=1}^{N^\ast} \gamma^{i-1} r_i / (\sum_{k=1}^{N^\ast} \gamma^{k-1})\) |
Here, the value of \(N^\ast\) depends on whether reduction with or without skip is being chosen. In case of the former, \(N^\ast = N\) and the replay buffer is searching for trajectories with at least \(N\) steps. If the trajectory terminates earlier, the sample is skipped and a new one is searched. If all trajectories are shorter than \(N\) steps, the replay buffer will never find a suitable sample.
In this case, it is useful to use the modes with the additional suffix _no_skip
. In this case, \(N^{\ast}\) in the formulas will be equal to the minimum of \(N\) and the number of steps needed to reach the end of the trajectory. The regular and no-skip modes are both useful in different occasions, so it is important to be clear about how the reward structure has to be designed in order to achieve the desired goals.
Replay Buffer Properties¶
The block in the configuration file defining algorithm properties takes the following structure:
replay_buffer:
type: <replay_buffer_type>
parameters:
<option> = <value>
Currently, only type uniform
is supported. The following table lists the available options:
Replay Buffer Type |
Option |
Data Type |
Description |
---|---|---|---|
|
|
integer |
Minimum number of samples before buffer is ready for training |
|
integer |
Maximum capacity |
Action Properties¶
The block in the configuration file defining action properties takes the following structure:
action:
type: <action_type>
parameters:
<option> = <value>
The following table lists the available options for every action type for ddpg
and td3
algorithms:
Action Type |
Option |
Data Type |
Description |
---|---|---|---|
|
|
float |
lower bound for action value |
|
float |
upper bound for action value |
|
|
float |
clip value for training noise |
|
|
float |
standard deviation for gaussian training noise |
|
|
float |
standard deviation for gaussian exploration noise |
|
|
bool |
flag to specify whether the standard deviation should be adaptive |
|
|
|
float |
lower bound for action value |
|
float |
upper bound for action value |
|
|
float |
clip value for training noise |
|
|
float |
standard deviation for Ornstein-Uhlenbeck training noise |
|
|
float |
standard deviation for Ornstein-Uhlenbeck exploration noise |
|
|
float |
mean reversion parameter for Ornstein-Uhlenbeck noise |
|
|
float |
time-step parameter for Ornstein-Uhlenbeck noise |
|
|
bool |
flag to specify whether the standard deviation should be adaptive |
The meaning for most of these parameters should be evident from looking at the details of the implementations for the various RL algorithms linked above.
However, some parameters require a more detailed explanation: in general, the suffix _ou
refers to stateful noise of Ornstein-Uhlenbeck type with zero drift. This noise type is often used if correlation between time steps is desired and thus popular in reinforcement learning. Check out the wikipedia page for details.
The prefix space
refers to applying the noise to the predicted ation directly. For example, if \(p\) is our (deterministic) policy function, an exploration action using space noise type is obtained by computing
for any input state \(s\) and policy weights \(\theta\). In case of parameter noise, the noise will be applied to each weight of \(p\) instead. Hence, the noised action is computed via
The parameter adaptive
specifies whether the noise variance \(\sigma\) should be taken relative to the magnitude of the action magnitudes or weight magnitudes for space and parameter noise respectively. In terms of the former, this would mean that
and analogous for parameter noise.
Whichever noise type and parameters are the best highly depends on the behavior of the environment and therefore we cannot give a general recommendation.
For algorithm type sac
, only action bounds are supported as the noise is built into the algorithm and cannot be customized.
Policy and Critic Properties¶
The block in the configuration file defining model properties for actor/policy and critic/value are similar to the supervised learning case (see Model Properties). In this case, TorchFort supports different model properties for policy and critic. The block configuration looks as follows:
critic_model:
type: <critic_model_type>
parameters:
<option> = <value>
policy_model:
type: <policy_model_type>
parameters:
<option> = <value>
Refer to the Model Properties for available model types and options.
Note
For algorithms which use multiple critics networks such as TD3, the critic model is copied internally num_critic
times and the weights are randomly initialized for each of these models independently.
Note
In case of SAC algorithm, make sure that the policy network not only returns the mean actions value tensor but also the log probability sigma tensor. As an example see the policy function implementation of stable baselines.
Learning Rate Schedule Properties¶
For reinforcement learning, TorchFort supports different learning rate schedules for policy and critic. The block configuration looks as follows:
critic_lr_scheduler:
type: <schedule_type>
parameters:
<option> = <value>
policy_lr_scheduler:
type: <schedule_type>
parameters:
<option> = <value>
Refer to the Learning Rate Schedule Properties for available scheduler types and options.