apex.optimizers¶

class
apex.optimizers.
FusedAdam
(params, lr=0.001, bias_correction=True, betas=(0.9, 0.999), eps=1e08, eps_inside_sqrt=False, weight_decay=0.0, max_grad_norm=0.0, amsgrad=False)¶ Implements Adam algorithm. Currently GPUonly. Requires Apex to be installed via
python setup.py install cuda_ext cpp_ext
.It has been proposed in Adam: A Method for Stochastic Optimization.
Parameters:  params (iterable) – iterable of parameters to optimize or dicts defining parameter groups.
 lr (float, optional) – learning rate. (default: 1e3)
 betas (Tuple[float, float], optional) – coefficients used for computing running averages of gradient and its square. (default: (0.9, 0.999))
 eps (float, optional) – term added to the denominator to improve numerical stability. (default: 1e8)
 weight_decay (float, optional) – weight decay (L2 penalty) (default: 0)
 amsgrad (boolean, optional) – whether to use the AMSGrad variant of this algorithm from the paper On the Convergence of Adam and Beyond (default: False) NOT SUPPORTED in FusedAdam!
 eps_inside_sqrt (boolean, optional) – in the ‘update parameters’ step, adds eps to the biascorrected second moment estimate before evaluating square root instead of adding it to the square root of second moment estimate as in the original paper. (default: False)

step
(closure=None, grads=None, output_params=None, scale=1.0, grad_norms=None)¶ Performs a single optimization step.
Parameters:  closure (callable, optional) – A closure that reevaluates the model and returns the loss.
 grads (list of tensors, optional) – weight gradient to use for the optimizer update. If gradients have type torch.half, parameters are expected to be in type torch.float. (default: None)
 params (output) – A reduced precision copy of the updated weights written out in addition to the regular updated weights. Have to be of same type as gradients. (default: None)
 scale (float, optional) – factor to divide gradient tensor values by before applying to weights. (default: 1)