model_calib

Calibration utilities.

Functions

awq

Apply AWQ to the model.

local_hessian_calibrate

Calibrate the model using local Hessian-weighted MSE search.

max_calibrate

Calibrate the model using max.

sequential_calibrate

Sequential calibration - a sequential layer-by-layer calibration algorithm.

smoothquant

Smooth-Quant variant with per-channel weight scaling.

svdquant

Lite version of SVDQuant.

awq(model, forward_loop=None, algorithm='awq_lite', **kwargs)

Apply AWQ to the model.

Parameters:
  • model (Module) – Model to be calibrated.

  • forward_loop (Callable[[Module], None] | None) – A callable which takes the model as argument and forwards calibration data through the model.

  • algorithm (str)

See AWQFullCalibConfig for details on the remaining arguments.

local_hessian_calibrate(model, forward_loop=None, distributed_sync=True, step_size=0.1, start_multiplier=0.25, stop_multiplier=4.0, fp8_scale_sweep=True, block_size=16, debug=False)

Calibrate the model using local Hessian-weighted MSE search.

Instead of minimizing weight error ||W - Wq||², this minimizes Hessian-weighted error loss = (W - Wq)ᵀ H (W - Wq) where H = X @ X.T approximates output reconstruction error ||WX - WqX||².

Per-block Hessians of shape (cin // block_size, block_size, block_size) are accumulated during forward pass and used to weight the MSE loss during scale search.

Parameters:
  • model (Module) – Model to be calibrated.

  • forward_loop (Callable[[Module], None] | None) – A callable which takes the model as argument and forwards calibration data through the model. Required for this algorithm.

  • distributed_sync (bool) – Whether to sync amax across distributed processes.

  • step_size (float) – Step size for amax search (default: 0.1).

  • start_multiplier (float) – Starting multiplier for amax search (default: 0.25).

  • stop_multiplier (float) – Ending multiplier for amax search (default: 4.0).

  • fp8_scale_sweep (bool) – If True, sweep over all 128 possible FP8 E4M3 scale values for NVFP4 per-block quantization (default: True).

  • block_size (int) – Block size for local Hessian computation (default: 16).

  • debug (bool) – If True, keep the local Hessian metadata on modules.

See LocalHessianCalibConfig for details on the configuration options.

max_calibrate(model, forward_loop=None, distributed_sync=True)

Calibrate the model using max.

Parameters:
  • model (Module) – Model to be calibrated.

  • forward_loop (Callable[[Module], None] | None) – A callable which takes the model as argument and forwards calibration data through the model.

  • distributed_sync – Whether to sync input_quantizer amax across distributed processes.

See MaxCalibConfig for details on the remaining arguments.

sequential_calibrate(model, forward_loop, calib_func, **calib_kwargs)

Sequential calibration - a sequential layer-by-layer calibration algorithm.

Parameters:
  • model (Module)

  • forward_loop (Callable[[Module], None])

  • calib_func (Callable)

smoothquant(model, forward_loop=None, alpha=1.0)

Smooth-Quant variant with per-channel weight scaling.

Parameters:
  • model (Module) – Model to be calibrated.

  • forward_loop (Callable[[Module], None] | None) – A callable which takes the model as argument and forwards calibration data through the model.

See SmoothQuantCalibConfig for details on the remaining arguments.

svdquant(model, forward_loop=None, lowrank=32, **kwargs)

Lite version of SVDQuant.

Parameters:
  • model (Module) – Model to be calibrated.

  • forward_loop (Callable[[Module], None] | None) – A callable which takes the model as argument and forwards calibration data through the model.

  • lowrank (int)

See SVDQuantConfig for details on the remaining arguments.