TorchFort Fortran API¶

These are all the types and functions available in the TorchFort Fortran API.

General¶

Types¶

torchfort_datatype¶

See documentation for equivalent C enumerator, torchfort_datatype_t.

torchfort_result¶

See documentation for equivalent C enumerator, torchfort_result_t.

torchfort_tensor_list¶

See documentation for equivalent C typedef, torchfort_tensor_list_t.

Global Context Settings¶

These are global routines which affect the behavior of the libtorch backend. It is therefore recommended to call these functions before any other TorchFort calls are made.

torchfort_set_cudnn_benchmark¶

function torchfort_set_cudnn_benchmark(flag)¶

Enables or disables cuDNN benchmark mode. See the PyTorch documentation for more details.

Parameters:: flag [logical,in] :: A flag to enable (.true.) or disable (.false.) cuDNN kernel benchmarking.
Return:: res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

Tensor List Management¶

torchfort_tensor_list_create¶

function torchfort_tensor_list_create(tensor_list)¶

Creates a TorchFort tensor list.

Parameters:: tensor_list [torchfort_tensor_list,out] :: An unintialized TorchFort tensor list.
Return:: res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

torchfort_tensor_list_destroy¶

function torchfort_tensor_list_destroy(tensor_list)¶

Destroys a TorchFort tensor list.

Parameters:: tensor_list [torchfort_tensor_list,in] :: A TorchFort tensor list.
Return:: res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

torchfort_tensor_list_add_tensor¶

function torchfort_tensor_list_add_tensor(tensor_list, data_arr)¶

Adds a tensor to a TorchFort tensor list. Tensor data is added by reference, so changes to externally provided memory will modify tensors contained in the list.

For this operation, T can be one of real(real32), real(real64), integer(int32), integer(int64)

Parameters:

tensor_list [torchfort_tensor_list,in] :: A TorchFort tensor list.
data_arr [T(*),in] :: An array containing the tensor data.

Return:

res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

Supervised Learning¶

Model Creation¶

torchfort_create_model¶

function torchfort_create_model(name, config_fname, device)¶

Creates a model from a provided configuration file.

Parameters:

handle [character(:),in] :: A name to assign to the created model instance to use as a key for other TorchFort routines.
config_fname [character(:),in] :: The filesystem path to the user-defined model configuration file to use.
device [integer,in] :: Which device to place and run the model on. For TORCHFORT_DEVICE_CPU (-1), model will be placed on CPU. For values >= 0, model will be placed on GPU with index corresponding to value.

Return:

res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

torchfort_create_distributed_model¶

function torchfort_create_distributed_model(name, config_fname, mpi_comm, device)¶

Creates a distributed data-parallel model from a provided configuration file.

Parameters:

handle [character(:),in] :: A name to assign to the created model instance to use as a key for other TorchFort routines.
config_fname [character(:),in] :: The filesystem path to the user-defined configuration file to use.
mpi_comm [integer,in] :: MPI communicator to use to initialize NCCL communication library for data-parallel communication.
device [integer,in] :: Which device to place and run the model on. For TORCHFORT_DEVICE_CPU (-1), model will be placed on CPU. For values >= 0, model will be placed on GPU with index corresponding to value.

Return:

res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

Model Training/Inference¶

torchfort_train¶

function torchfort_train(mname, input, label, loss_val, stream)¶

Runs a training iteration of a model instance using provided input and label data.

For this operation, T can be one of real(real32), real(real64)

Parameters:

mname [character(:),in] :: The key of the model instance.
input [T(*),in] :: An array containing the input data. The last array dimension should be the batch dimension, the other dimensions are the feature dimensions.
label [T(*),in] :: An array containing the label data. The last array dimension should be the batch dimension. label does not need to be of the same shape as input but the batch dimension should match. Additionally, label should be of the same rank as input.
loss_val [T,out] :: A variable that will hold the loss value computed during the training iteration.
stream [integer(int64),in,optional] :: CUDA stream to enqueue the operation. This argument is ignored if the model is on the CPU.

Return:

res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

torchfort_train_multiarg¶

function torchfort_train(mname, inputs, labels, loss_val, extra_loss_args, stream)

Runs a training iteration of a model instance using provided input and label tensor lists.

Parameters:

mname [character(:),in] :: The key of the model instance.
inputs [torchfort_tensor_list,in] :: A tensor list of input tensors.
labels [torchfort_tensor_list,in] :: A tensor list of label tensors.
loss_val [real(real32),out] :: A single precision scalar that will hold the loss value computed during the training iteration.
extra_loss_args [torchfort_tensor_list,in,optional] :: A tensor list of additional tensors to pass into loss computation.
stream [integer(int64),in,optional] :: CUDA stream to enqueue the operation. This argument is ignored if the model is on the CPU.

Return:

res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

torchfort_inference¶

function torchfort_inference(mname, input, output, stream)¶

Runs inference on a model using provided input data.

For this operation, T can be one of real(real32), real(real64)

Parameters:

mname [character(:),in] :: The key of the model instance.
input [T(*),in] :: An array containing the input data. The last array dimension should be the batch dimension, the other dimensions are the feature dimensions.
output [T(*),out] :: An array which will hold the output of the model. The last array dimension should be the batch dimension. output does not need to be of the same shape as input but the batch dimension should match. Additionally, output should be of the same rank as input.
stream [integer(int64),in,optional] :: CUDA stream to enqueue the operation. This argument is ignored if the model is on the CPU.

Return:

res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

torchfort_inference_multiarg¶

function torchfort_inference_multiarg(mname, inputs, outputs, stream)¶

Runs inference on a model using provided input data tensor list.

Parameters:

mname [character(:),in] :: The key of the model instance.
inputs [torchfort_tensor_list,in] :: A tensor list of input tensors.
outputs [torchfort_tensor_list,inout] :: A tensor list of output tensors.
stream [integer(int64),in,optional] :: CUDA stream to enqueue the operation. This argument is ignored if the model is on the CPU.

Return:

res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

Model Management¶

torchfort_save_model¶

function torchfort_save_model(mname, fname)¶

Saves a model to file.

Parameters:

mname [character(:),in] :: The name of model instance to save, as defined during model creation.
fname [character(:),in] :: The filename to save the model weights to.

Return:

res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

torchfort_load_model¶

function torchfort_load_model(mname, fname)¶

Loads a model from a file.

Parameters:

mname [character(:),in] :: The name of model instance to load the model weights to, as defined during model creation.
fname [character(:),in] :: The filename to load the model weights from.

Return:

res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

torchfort_save_checkpoint¶

function torchfort_save_checkpoint(mname, checkpoint_dir)¶

Saves a training checkpoint to a directory. In contrast to torchfort_save_model, this function saves additional state to restart training, like the optimizer states and learning rate schedule progress.

Parameters:

mname [character(:),in] :: The name of model instance to save, as defined during model creation.
checkpoint_dir [character(:),in] :: A writeable filesystem path to a directory to save the checkpoint data to.

Return:

res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

torchfort_load_checkpoint¶

function torchfort_load_checkpoint(mname, checkpoint_dir)¶

Loads a training checkpoint from a directory. In contrast to torchfort_load_model, this function loads additional state to restart training, like the optimizer states and learning rate schedule progress.

Parameters:

mname [character(:),in] :: The name of model instance to load checkpoint data into, as defined during model creation.
checkpoint_dir [character(:),in] :: A readable filesystem path to a directory to load the checkpoint data from.

Return:

res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

Weights and Biases Logging¶

torchfort_wandb_log_int¶

function torchfort_wandb_log_int(mname, metric_name, step, val)¶

Write an integer value to a Weights and Bias log. Use the _float and _double variants to write real32 and real64 values respectively.

Parameters:

mname [character(:),in] :: The name of model instance to associate this metric value with, as defined during model creation.
metric_name [character(:),in] :: Metric label.
step [integer,in] :: Training/inference step to associate with metric value.
val [integer,in] :: Metric value to log.

Return:

res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

torchfort_wandb_log_float¶

function torchfort_wandb_log_float(mname, metric_name, step, val)¶

torchfort_wandb_log_double¶

function torchfort_wandb_log_double(mname, metric_name, step, val)¶

Reinforcement Learning¶

Similar to other reinforcement learning frameworks such as Spinning Up from OpenAI or Stable Baselines, we distinguish between on-policy and off-policy algorithms since those two types require different APIs.

Off-Policy Algorithms¶

System Creation¶

Basic routines to create and register a reinforcement learning system in the internal registry. A (synchronous) data parallel distributed option is available.

torchfort_rl_off_policy_create_system¶

function torchfort_rl_off_policy_create_system(name, config_fname, model_device, rb_device)¶

Creates an off-policy reinforcement learning training system instance from a provided configuration file.

Parameters:

name [character(:),in] :: A name to assign to the created training system instance to use as a key for other TorchFort routines.
config_fname [character(:),in] :: The filesystem path to the user-defined configuration file to use.
model_device [integer,in] :: Which device to place and run the model on. For TORCHFORT_DEVICE_CPU (-1), model will be placed on CPU. For values >= 0, model will be placed on GPU with index corresponding to value.
rb_device [integer,in] :: Which device to place and run the replay buffer on. For TORCHFORT_DEVICE_CPU (-1), replay buffer will be placed on CPU. For values >= 0, it will be placed on GPU with index corresponding to value.

Return:

res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

torchfort_rl_off_policy_create_distributed_system¶

function torchfort_rl_off_policy_create_distributed_system(name, config_fname, mpi_comm, model_device, rb_device)¶

Creates a (synchronous) data-parallel off-policy reinforcement learning system instance from a provided configuration file.

Parameters:

name [character(:),in] :: A name to assign to the created training system instance to use as a key for other TorchFort routines.
config_fname [character(:),in] :: The filesystem path to the user-defined configuration file to use.
mpi_comm [integer,in] :: MPI communicator to use to initialize NCCL communication library for data-parallel communication.
model_device [integer,in] :: Which device to place and run the model on. For TORCHFORT_DEVICE_CPU (-1), model will be placed on CPU. For values >= 0, model will be placed on GPU with index corresponding to value.
rb_device [integer,in] :: Which device to place the replay buffer on. For TORCHFORT_DEVICE_CPU (-1), replay buffer will be placed on CPU. For values >= 0, it will be placed on GPU with index corresponding to value.

Return:

res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

Training/Evaluation¶

These routines are be used for training the reinforcement learning system or for steering the environment.

torchfort_rl_off_policy_train_step¶

function torchfort_rl_off_policy_train_step(name, p_loss_val, q_loss_val, stream)¶

Runs a training iteration of an off-policy refinforcement learning instance and returns loss values for policy and value functions. This routine samples a batch of specified size from the replay buffer according to the buffers sampling procedure and performs a train step using this sample. The details of the training procedure are abstracted away from the user and depend on the chosen system algorithm. For this operation, T can be one of real(real32), real(real64)

Parameters:

name [character(:),in] :: The name of system instance to use, as defined during system creation.
p_loss_val [T,out] :: A single or double precision variable which will hold the policy loss value computed during the training iteration.
q_loss_val [T,out] :: A single or double precision variable which will hold the critic loss value computed during the training iteration, averaged over all available critics (depends on the chosen algorithm).
stream [integer(int64),in,optional] :: CUDA stream to enqueue the operation. This argument is ignored if the model is on the CPU.

Return:

res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

torchfort_rl_off_policy_predict_explore¶

function torchfort_rl_off_policy_predict_explore(name, state, act, stream)¶

Suggests an action based on the current state of the system and adds noise as specified by the coprresponding reinforcement learning system. Depending on the reinforcement learning algorithm used, the prediction is performed by the main network (not the target network). In contrast to torchfort_rl_off_policy_predict, this routine adds noise and thus is called explorative. The kind of noise is specified during system creation.

For this operation, T can be one of real(real32), real(real64)

Parameters:

name [character(:),in] :: The name of system instance to use, as defined during system creation.
state [T,in] :: Multi-dimensional array of size (…, batch_size), depending on the dimensionality of the state space.
act [T,out] :: Multi-dimensional array of size (…, batch_size), depending on the dimensionality of the action space.
stream [integer(int64),in,optional] :: CUDA stream to enqueue the operation. This argument is ignored if the model is on the CPU.

Return:

res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

torchfort_rl_off_policy_predict¶

function torchfort_rl_off_policy_predict(name, state, act, stream)¶

Suggests an action based on the current state of the system. Depending on the algorithm used, the prediction is performed by the target network. In contrast to torchfort_rl_off_policy_predict_explore, this routine does not add noise, which means it is exploitative.

For this operation, T can be one of real(real32), real(real64)

Parameters:

name [character(:),in] :: The name of system instance to use, as defined during system creation.
state [T,in] :: Multi-dimensional array of size (…, batch_size), depending on the dimensionality of the state space.
act [T,out] :: Multi-dimensional array of size (…, batch_size), depending on the dimensionality of the action space.
stream [integer(int64),in,optional] :: CUDA stream to enqueue the operation. This argument is ignored if the model is on the CPU.

Return:

res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

torchfort_rl_off_policy_evaluate¶

function torchfort_rl_off_policy_evaluate(name, state, act, reward, stream)¶

Predicts the future reward based on the current state and selected action. Depending on the learning algorithm, the routine queries the target critic networks for this. The routine averages the predictions over all critics.

For this operation, T can be one of real(real32), real(real64)

Parameters:

name [character(:),in] :: The name of system instance to use, as defined during system creation.
state [T,in] :: Multi-dimensional array of size (…, batch_size), depending on the dimensionality of the state space.
act [T,in] :: Multi-dimensional array of size (…, batch_size), depending on the dimensionality of the action space.
reward [T,out] :: One-dimensional array of size (batch_size) which will hold the predicted reward values.
stream [integer(int64),in,optional] :: CUDA stream to enqueue the operation. This argument is ignored if the model is on the CPU.

Return:

res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

System Management¶

torchfort_rl_off_policy_update_replay_buffer¶

function torchfort_rl_off_policy_update_replay_buffer(name, state_old, act_old, state_new, reward, terminal, stream)¶

Adds a new \((s, a, s', r, d)\) tuple to the replay buffer. Here \(s\) (state_old) is the state for which action \(a\) (action_old) was taken, leading to \(s'\) (state_new) and receiving reward \(r\) (reward). The terminal state flag \(d\) (final_state) specifies whether \(s'\) is the final state in the episode. For a local multi-env environment (n_envs>=1), the last dim on the passed tensors has to be equal to n_envs, an reward and terminal both have to be 1D tensors of size n_env as well. For single env (n_env=1), the env dimension can be omitted and in that case reward has to be a scalar and terminal a boolean flag.

For this operation, T can be one of real(real32), real(real64)

Parameters:

name [character(:),in] :: The name of system instance to use, as defined during system creation.
state_old [T,in] :: Multi-dimensional array of size of the state space.
act_old [T,in] :: Multi-dimensional array of size of the action space.
state_new [T,in] :: Multi-dimensional array of size of the state space.
reward [T,in] :: Reward value.
final_state [logical,in] :: Terminal flag.
stream [integer(int64),in,optional] :: CUDA stream to enqueue the operation. This argument is ignored if the model is on the CPU.

Return:

res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

torchfort_rl_off_policy_is_ready¶

function torchfort_rl_off_policy_is_ready(name, ready)¶

Queries a reinforcement learning system for rediness to start training. A user should call this method before starting training to make sure the reinforcement learning system is ready. This ensures that the replay buffer is filled sufficiently with exploration data as specified during system creation.

Parameters:

name [character(:),in] :: The name of system instance to use, as defined during system creation.
ready [logical,out] :: Logical indicating if the system is ready for training.

Return:

res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

torchfort_rl_off_policy_save_checkpoint¶

function torchfort_rl_off_policy_save_checkpoint(name, checkpoint_dir)¶

Saves a reinforcement learning training checkpoint to a directory. This method saves all models (policies, critics, target models if available) together with their corresponding optimizer and LR scheduler. states. It also saves the state of the replay buffer, to allow for smooth restarts of reinforcement learning training processes. This function should be used in conjunction with torchfort_rl_off_policy_load_checkpoint.

Parameters:

name [character(:),in] :: The name of system instance to use, as defined during system creation.
checkpoint_dir [character(:),in] :: A filesystem path to a directory to save the checkpoint data to.

Return:

res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

torchfort_rl_off_policy_load_checkpoint¶

function torchfort_rl_off_policy_load_checkpoint(name, checkpoint_dir)¶

Restores a reinforcement learning system from a checkpoint. This method restores all models (policies, critics, target models if available) together with their corresponding optimizer and LR scheduler states. It also fully restores the state of the replay buffer, but not the current RNG seed. This function should be used in conjunction with torchfort_rl_off_policy_save_checkpoint.

Parameters:

name [character(:),in] :: The name of system instance to use, as defined during system creation.
checkpoint_dir [character(:),in] :: A filesystem path to a directory which contains the checkpoint data to load.

Return:

res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

Weights and Biases Logging¶

torchfort_rl_off_policy_wandb_log_int¶

function torchfort_rl_off_policy_wandb_log_int(mname, metric_name, step, val)¶

Write an integer value to a Weights and Bias log. Use the _float and _double variants to write real32 and real64 values respectively.

Parameters:

mname [character(:),in] :: The name of model instance to associate this metric value with, as defined during model creation.
metric_name [character(:),in] :: Metric label.
step [integer,in] :: Training/inference step to associate with metric value.
val [integer,in] :: Metric value to log.

Return:

res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

torchfort_rl_off_policy_wandb_log_float¶

function torchfort_rl_off_policy_wandb_log_float(mname, metric_name, step, val)¶

torchfort_rl_off_policy_wandb_log_double¶

function torchfort_rl_off_policy_wandb_log_double(mname, metric_name, step, val)¶

On-Policy Algorithms¶

System Creation¶

Basic routines to create and register a reinforcement learning system in the internal registry. A (synchronous) data parallel distributed option is available.

torchfort_rl_on_policy_create_system¶

function torchfort_rl_on_policy_create_system(name, config_fname, model_device, rb_device)¶

Creates an on-policy reinforcement learning training system instance from a provided configuration file.

Parameters:

name [character(:),in] :: A name to assign to the created training system instance to use as a key for other TorchFort routines.
config_fname [character(:),in] :: The filesystem path to the user-defined configuration file to use.
model_device [integer,in] :: Which device to place and run the model on. For TORCHFORT_DEVICE_CPU (-1), model will be placed on CPU. For values >= 0, model will be placed on GPU with index corresponding to value.
rb_device [integer,in] :: Which device to place the rollout buffer on. For TORCHFORT_DEVICE_CPU (-1), rollout buffer will be placed on CPU. For values >= 0, it will be placed on GPU with index corresponding to value.

Return:

res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

torchfort_rl_on_policy_create_distributed_system¶

function torchfort_rl_on_policy_create_distributed_system(name, config_fname, mpi_comm, model_device, rb_device)¶

Creates a (synchronous) data-parallel on-policy reinforcement learning system instance from a provided configuration file.

Parameters:

name [character(:),in] :: A name to assign to the created training system instance to use as a key for other TorchFort routines.
config_fname [character(:),in] :: The filesystem path to the user-defined configuration file to use.
mpi_comm [integer,in] :: MPI communicator to use to initialize NCCL communication library for data-parallel communication.
model_device [integer,in] :: Which device to place and run the model on. For TORCHFORT_DEVICE_CPU (-1), model will be placed on CPU. For values >= 0, model will be placed on GPU with index corresponding to value.
rb_device [integer,in] :: Which device to place the rollout buffer on. For TORCHFORT_DEVICE_CPU (-1), rollout buffer will be placed on CPU. For values >= 0, it will be placed on GPU with index corresponding to value.

Return:

res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

Training/Evaluation¶

These routines are be used for training the reinforcement learning system or for steering the environment.

torchfort_rl_on_policy_train_step¶

function torchfort_rl_on_policy_train_step(name, p_loss_val, q_loss_val, stream)¶

Runs a training iteration of an on-policy refinforcement learning instance and returns loss values for policy and value functions. This routine samples a batch of specified size from the rollout buffer according to the buffers sampling procedure and performs a train step using this sample. The details of the training procedure are abstracted away from the user and depend on the chosen system algorithm. Note that the rollout buffer needs to be finalized or otherwise the train step will be skipped. For this operation, T can be one of real(real32), real(real64)

Parameters:

name [character(:),in] :: The name of system instance to use, as defined during system creation.
p_loss_val [T,out] :: A single or double precision variable which will hold the policy loss value computed during the training iteration.
q_loss_val [T,out] :: A single or double precision variable which will hold the critic loss value computed during the training iteration, averaged over all available critics (depends on the chosen algorithm).
stream [integer(int64),in,optional] :: CUDA stream to enqueue the operation. This argument is ignored if the model is on the CPU.

Return:

res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

torchfort_rl_on_policy_predict_explore¶

function torchfort_rl_on_policy_predict_explore(name, state, act, stream)¶

Suggests an action based on the current state of the system and adds noise as specified by the coprresponding reinforcement learning system. Depending on the reinforcement learning algorithm used, the prediction is performed by the main network (not the target network). In contrast to torchfort_rl_off_policy_predict, this routine adds noise and thus is called explorative. The kind of noise is specified during system creation.

For this operation, T can be one of real(real32), real(real64)

Parameters:

name [character(:),in] :: The name of system instance to use, as defined during system creation.
state [T,in] :: Multi-dimensional array of size (…, batch_size), depending on the dimensionality of the state space.
act [T,out] :: Multi-dimensional array of size (…, batch_size), depending on the dimensionality of the action space.
stream [integer(int64),in,optional] :: CUDA stream to enqueue the operation. This argument is ignored if the model is on the CPU.

Return:

res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

torchfort_rl_on_policy_predict¶

function torchfort_rl_on_policy_predict(name, state, act, stream)¶

Suggests an action based on the current state of the system. Depending on the algorithm used, the prediction is performed by the target network. In contrast to torchfort_rl_on_policy_predict_explore, this routine does not add noise, which means it is exploitative.

For this operation, T can be one of real(real32), real(real64)

Parameters:

name [character(:),in] :: The name of system instance to use, as defined during system creation.
state [T,in] :: Multi-dimensional array of size (…, batch_size), depending on the dimensionality of the state space.
act [T,out] :: Multi-dimensional array of size (…, batch_size), depending on the dimensionality of the action space.
stream [integer(int64),in,optional] :: CUDA stream to enqueue the operation. This argument is ignored if the model is on the CPU.

Return:

res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

torchfort_rl_on_policy_evaluate¶

function torchfort_rl_on_policy_evaluate(name, state, act, reward, stream)¶

Predicts the future reward based on the current state and selected action. Depending on the learning algorithm, the routine queries the target critic networks for this. The routine averages the predictions over all critics.

For this operation, T can be one of real(real32), real(real64)

Parameters:

name [character(:),in] :: The name of system instance to use, as defined during system creation.
state [T,in] :: Multi-dimensional array of size (…, batch_size), depending on the dimensionality of the state space.
act [T,in] :: Multi-dimensional array of size (…, batch_size), depending on the dimensionality of the action space.
reward [T,out] :: One-dimensional array of size (batch_size) which will hold the predicted reward values.
stream [integer(int64),in,optional] :: CUDA stream to enqueue the operation. This argument is ignored if the model is on the CPU.

Return:

res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

System Management¶

torchfort_rl_on_policy_update_rollout_buffer¶

function torchfort_rl_on_policy_update_rollout_buffer(name, state_old, act_old, state_new, reward, terminal, stream)¶

Adds a new \((s, a, r, d)\) tuple to the rollout buffer. Here \(s\) (state) is the state for which action \(a\) (action) was taken, leading to reward \(r\) (reward). The terminal state flag \(d\) (terminal) specifies whether the state is the final state in the episode. Note that value estimates \(q\) as well was log-probabilities are also stored but the user does not need to pass those manually, , those values are computed internally from the current policy and stored with the other values. For a local multi-env environment (n_envs>=1), the last dim on the passed tensors has to be equal to n_envs, an reward and terminal both have to be 1D tensors of size n_env as well. For single env (n_env=1), the env dimension can be omitted and in that case reward has to be a scalar and terminal a boolean flag.

For this operation, T can be one of real(real32), real(real64)

Parameters:

name [character(:),in] :: The name of system instance to use, as defined during system creation.
state [T,in] :: Multi-dimensional array of size of the state space.
act [T,in] :: Multi-dimensional array of size of the action space.
reward [T,in] :: Reward value.
terminal [logical,in] :: Terminal flag.
stream [integer(int64),in,optional] :: CUDA stream to enqueue the operation. This argument is ignored if the model is on the CPU.

Return:

res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

torchfort_rl_on_policy_reset_rollout_buffer¶

function torchfort_rl_on_policy_reset_rollout_buffer(name)¶

This function call clears the rollout buffer and resets all variables.

Parameters:: name [character(:),in] :: The name of system instance to use, as defined during system creation.
Return:: res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

torchfort_rl_on_policy_is_ready¶

function torchfort_rl_on_policy_is_ready(name, ready)¶

Queries a reinforcement learning system for rediness to start training. A user should call this method before starting training to make sure the reinforcement learning system is ready. This ensures that the rollout buffer is filled sufficiently with exploration data as specified during system creation. It also checks if the rollout buffer was properly finalized, e.g. all advantages were computed.

Parameters:

name [character(:),in] :: The name of system instance to use, as defined during system creation.
ready [logical,out] :: Logical indicating if the system is ready for training.

Return:

res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

torchfort_rl_on_policy_save_checkpoint¶

function torchfort_rl_on_policy_save_checkpoint(name, checkpoint_dir)¶

Saves a reinforcement learning training checkpoint to a directory. This method saves all models (policies, critics, target models if available) together with their corresponding optimizer and LR scheduler. states. It also saves the state of the rollout buffer, to allow for smooth restarts of reinforcement learning training processes. This function should be used in conjunction with torchfort_rl_on_policy_load_checkpoint.

Parameters:

name [character(:),in] :: The name of system instance to use, as defined during system creation.
checkpoint_dir [character(:),in] :: A filesystem path to a directory to save the checkpoint data to.

Return:

res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

torchfort_rl_on_policy_load_checkpoint¶

function torchfort_rl_on_policy_load_checkpoint(name, checkpoint_dir)¶

Restores a reinforcement learning system from a checkpoint. This method restores all models (policies, critics, target models if available) together with their corresponding optimizer and LR scheduler states. It also fully restores the state of the rollout buffer, but not the current RNG seed. This function should be used in conjunction with torchfort_rl_on_policy_save_checkpoint.

Parameters:

name [character(:),in] :: The name of system instance to use, as defined during system creation.
checkpoint_dir [character(:),in] :: A filesystem path to a directory which contains the checkpoint data to load.

Return:

res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

Weights and Biases Logging¶

torchfort_rl_on_policy_wandb_log_int¶

function torchfort_rl_on_policy_wandb_log_int(mname, metric_name, step, val)¶

Write an integer value to a Weights and Bias log. Use the _float and _double variants to write real32 and real64 values respectively.

Parameters:

mname [character(:),in] :: The name of model instance to associate this metric value with, as defined during model creation.
metric_name [character(:),in] :: Metric label.
step [integer,in] :: Training/inference step to associate with metric value.
val [integer,in] :: Metric value to log.

Return:

res [torchfort_result] :: TORCHFORT_RESULT_SUCCESS on success or error code on failure.

torchfort_rl_on_policy_wandb_log_float¶

function torchfort_rl_on_policy_wandb_log_float(mname, metric_name, step, val)¶

torchfort_rl_on_policy_wandb_log_double¶

function torchfort_rl_on_policy_wandb_log_double(mname, metric_name, step, val)¶