warp.optim.SGD#
- class warp.optim.SGD(
- params=None,
- lr=0.001,
- momentum=0.0,
- dampening=0.0,
- weight_decay=0.0,
- nesterov=False,
Stochastic Gradient Descent (SGD) optimizer with optional momentum.
This optimizer implements gradient descent with support for momentum, Nesterov accelerated gradient, and weight decay (L2 regularization).
The interface is similar to PyTorch’s torch.optim.SGD.
- Parameters:
params – List of
warp.arrayobjects to optimize. Can beNoneand set later viaset_params().lr – Learning rate (step size).
momentum – Momentum factor for accelerating SGD in relevant directions.
dampening – Dampening factor applied to the momentum.
weight_decay – Weight decay coefficient (L2 regularization).
nesterov – Whether to use Nesterov momentum. Requires
momentum > 0anddampening = 0.
- __init__(
- params=None,
- lr=0.001,
- momentum=0.0,
- dampening=0.0,
- weight_decay=0.0,
- nesterov=False,
Methods
__init__([params, lr, momentum, dampening, ...])Reset momentum buffers and timestep to zero.
set_params(params)Set parameters to optimize and allocate momentum buffers.
step(grad)Apply one SGD step using the provided gradients.
step_detail(g, b, lr, momentum, dampening, ...)Apply an SGD update to a single parameter array.
- set_params(params)[source]#
Set parameters to optimize and allocate momentum buffers.
- Parameters:
params – List of
warp.arrayobjects to optimize, orNone.
- step(grad)[source]#
Apply one SGD step using the provided gradients.
- Parameters:
grad – List of gradient arrays matching
params.
- static step_detail(
- g,
- b,
- lr,
- momentum,
- dampening,
- weight_decay,
- nesterov,
- t,
- params,
Apply an SGD update to a single parameter array.
- Parameters:
g – Gradient array.
b – Momentum buffer.
lr – Learning rate.
momentum – Momentum factor.
dampening – Momentum dampening factor.
weight_decay – Weight decay coefficient.
nesterov – Whether to use Nesterov momentum.
t – Current step index.
params – Parameter array to update in-place.