model_calib
Calibration utilities.
Functions
Apply AWQ to the model. |
|
Calibrate the model using local Hessian-weighted MSE search. |
|
Calibrate the model using max. |
|
Sequential calibration - a sequential layer-by-layer calibration algorithm. |
|
Smooth-Quant variant with per-channel weight scaling. |
|
Lite version of SVDQuant. |
- awq(model, forward_loop=None, algorithm='awq_lite', **kwargs)
Apply AWQ to the model.
- Parameters:
model (Module) – Model to be calibrated.
forward_loop (Callable[[Module], None] | None) – A callable which takes the model as argument and forwards calibration data through the model.
algorithm (str)
See
AWQFullCalibConfigfor details on the remaining arguments.
- local_hessian_calibrate(model, forward_loop=None, distributed_sync=True, step_size=0.1, start_multiplier=0.25, stop_multiplier=4.0, fp8_scale_sweep=True, block_size=16, debug=False)
Calibrate the model using local Hessian-weighted MSE search.
Instead of minimizing weight error
||W - Wq||², this minimizes Hessian-weighted errorloss = (W - Wq)ᵀ H (W - Wq)whereH = X @ X.Tapproximates output reconstruction error||WX - WqX||².Per-block Hessians of shape
(cin // block_size, block_size, block_size)are accumulated during forward pass and used to weight the MSE loss during scale search.- Parameters:
model (Module) – Model to be calibrated.
forward_loop (Callable[[Module], None] | None) – A callable which takes the model as argument and forwards calibration data through the model. Required for this algorithm.
distributed_sync (bool) – Whether to sync amax across distributed processes.
step_size (float) – Step size for amax search (default: 0.1).
start_multiplier (float) – Starting multiplier for amax search (default: 0.25).
stop_multiplier (float) – Ending multiplier for amax search (default: 4.0).
fp8_scale_sweep (bool) – If True, sweep over all 128 possible FP8 E4M3 scale values for NVFP4 per-block quantization (default: True).
block_size (int) – Block size for local Hessian computation (default: 16).
debug (bool) – If True, keep the local Hessian metadata on modules.
See
LocalHessianCalibConfigfor details on the configuration options.
- max_calibrate(model, forward_loop=None, distributed_sync=True)
Calibrate the model using max.
- Parameters:
model (Module) – Model to be calibrated.
forward_loop (Callable[[Module], None] | None) – A callable which takes the model as argument and forwards calibration data through the model.
distributed_sync – Whether to sync input_quantizer amax across distributed processes.
See
MaxCalibConfigfor details on the remaining arguments.
- sequential_calibrate(model, forward_loop, calib_func, **calib_kwargs)
Sequential calibration - a sequential layer-by-layer calibration algorithm.
- Parameters:
model (Module)
forward_loop (Callable[[Module], None])
calib_func (Callable)
- smoothquant(model, forward_loop=None, alpha=1.0)
Smooth-Quant variant with per-channel weight scaling.
- Parameters:
model (Module) – Model to be calibrated.
forward_loop (Callable[[Module], None] | None) – A callable which takes the model as argument and forwards calibration data through the model.
See
SmoothQuantCalibConfigfor details on the remaining arguments.
- svdquant(model, forward_loop=None, lowrank=32, **kwargs)
Lite version of SVDQuant.
- Parameters:
model (Module) – Model to be calibrated.
forward_loop (Callable[[Module], None] | None) – A callable which takes the model as argument and forwards calibration data through the model.
lowrank (int)
See
SVDQuantConfigfor details on the remaining arguments.