FusedAdam(params, lr=0.001, bias_correction=True, betas=(0.9, 0.999), eps=1e-08, eps_inside_sqrt=False, weight_decay=0.0, max_grad_norm=0.0, amsgrad=False)¶
Implements Adam algorithm. Currently GPU-only. Requires Apex to be installed via
python setup.py install --cuda_ext --cpp_ext.
It has been proposed in Adam: A Method for Stochastic Optimization.
- params (iterable) – iterable of parameters to optimize or dicts defining parameter groups.
- lr (float, optional) – learning rate. (default: 1e-3)
- betas (Tuple[float, float], optional) – coefficients used for computing running averages of gradient and its square. (default: (0.9, 0.999))
- eps (float, optional) – term added to the denominator to improve numerical stability. (default: 1e-8)
- weight_decay (float, optional) – weight decay (L2 penalty) (default: 0)
- amsgrad (boolean, optional) – whether to use the AMSGrad variant of this algorithm from the paper On the Convergence of Adam and Beyond (default: False) NOT SUPPORTED in FusedAdam!
- eps_inside_sqrt (boolean, optional) – in the ‘update parameters’ step, adds eps to the bias-corrected second moment estimate before evaluating square root instead of adding it to the square root of second moment estimate as in the original paper. (default: False)
step(closure=None, grads=None, output_params=None, scale=1.0, grad_norms=None)¶
Performs a single optimization step.
- closure (callable, optional) – A closure that reevaluates the model and returns the loss.
- grads (list of tensors, optional) – weight gradient to use for the optimizer update. If gradients have type torch.half, parameters are expected to be in type torch.float. (default: None)
- params (output) – A reduced precision copy of the updated weights written out in addition to the regular updated weights. Have to be of same type as gradients. (default: None)
- scale (float, optional) – factor to divide gradient tensor values by before applying to weights. (default: 1)