# apex.optimizers¶

class apex.optimizers.FusedAdam(params, lr=0.001, bias_correction=True, betas=(0.9, 0.999), eps=1e-08, eps_inside_sqrt=False, weight_decay=0.0, max_grad_norm=0.0, amsgrad=False)

Implements Adam algorithm. Currently GPU-only. Requires Apex to be installed via python setup.py install --cuda_ext --cpp_ext.

It has been proposed in Adam: A Method for Stochastic Optimization.

Parameters: params (iterable) – iterable of parameters to optimize or dicts defining parameter groups. lr (float, optional) – learning rate. (default: 1e-3) betas (Tuple[float, float], optional) – coefficients used for computing running averages of gradient and its square. (default: (0.9, 0.999)) eps (float, optional) – term added to the denominator to improve numerical stability. (default: 1e-8) weight_decay (float, optional) – weight decay (L2 penalty) (default: 0) amsgrad (boolean, optional) – whether to use the AMSGrad variant of this algorithm from the paper On the Convergence of Adam and Beyond (default: False) NOT SUPPORTED in FusedAdam! eps_inside_sqrt (boolean, optional) – in the ‘update parameters’ step, adds eps to the bias-corrected second moment estimate before evaluating square root instead of adding it to the square root of second moment estimate as in the original paper. (default: False)
step(closure=None, grads=None, output_params=None, scale=1.0, grad_norms=None)

Performs a single optimization step.

Parameters: closure (callable, optional) – A closure that reevaluates the model and returns the loss. grads (list of tensors, optional) – weight gradient to use for the optimizer update. If gradients have type torch.half, parameters are expected to be in type torch.float. (default: None) params (output) – A reduced precision copy of the updated weights written out in addition to the regular updated weights. Have to be of same type as gradients. (default: None) scale (float, optional) – factor to divide gradient tensor values by before applying to weights. (default: 1)