model_calib

Calibration utilities.

Functions

`awq`	Apply AWQ to the model.
`layerwise_calibrate`	Layerwise calibration - a layer-by-layer calibration algorithm.
`local_hessian_calibrate`	Calibrate weight quantizers by minimizing the Hessian-weighted error.
`max_calibrate`	Calibrate the model using max.
`smoothquant`	Smooth-Quant variant with per-channel weight scaling.
`svdquant`	Lite version of SVDQuant.

awq(model, forward_loop=None, algorithm='awq_lite', **kwargs)

Apply AWQ to the model.

Parameters:

model (Module) – Model to be calibrated.
forward_loop (Callable[[Module], None] | None) – A callable which takes the model as argument and forwards calibration data through the model.
algorithm (str)

See AWQFullCalibConfig for details on the remaining arguments.

layerwise_calibrate(model, forward_loop, calib_func, **calib_kwargs)

Layerwise calibration - a layer-by-layer calibration algorithm.

Runs the full model forward per layer but patches decoder layers with a skip / run / capture strategy so that inter-layer logic in parent modules (e.g. mask construction) executes naturally without model-specific hooks.

If checkpoint_dir is passed (via calib_kwargs), per-layer checkpoints are saved after each layer completes. On restart, calibration resumes from the last completed layer.

get_qdq_activations_from_prev_layer (via calib_kwargs) controls whether the cached inputs handed to layer N+1 come from a forward through the just-calibrated layer with quantizers active (True; e.g. GPTQ) or temporarily disabled (False; matches non-layerwise max-calib semantics).

Parameters:

model (Module)
forward_loop (Callable[[Module], None])
calib_func (Callable)

local_hessian_calibrate(model, forward_loop=None, distributed_sync=True, step_size=0.1, start_multiplier=0.25, stop_multiplier=4.0, fp8_scale_sweep=True, block_size=16, debug=False, shared_states=None)

Calibrate weight quantizers by minimizing the Hessian-weighted error.

Minimizes (W - Wq)ᵀ H (W - Wq) with per-block Hessian H = ΣXᵀX (approximating the output error ||WX - WqX||²), built from a forward with weight fake-quant disabled (input quantizers untouched) and fed to mse_calibrate()’s weight search via error_func.

Like mse_calibrate(), TensorQuantizer weights are calibrated — with the Hessian metric where a weight pairs with its input activations (dense linears and HF fused-MoE experts), plain MSE otherwise. Other quantizer types (e.g. SequentialQuantizer) are unsupported and left at their max-calibrated scale.

Parameters:

model (Module) – Model to be calibrated.
forward_loop (Callable[[Module], None] | None) – A callable which takes the model as argument and forwards calibration data through the model. Required for this algorithm.
distributed_sync (bool) – Whether to sync amax across distributed processes.
step_size (float) – Step size for amax search (default: 0.1).
start_multiplier (float) – Starting multiplier for amax search (default: 0.25).
stop_multiplier (float) – Ending multiplier for amax search (default: 4.0).
fp8_scale_sweep (bool) – If True, sweep over all 128 possible FP8 E4M3 scale values for NVFP4 per-block quantization (default: True).
block_size (int) – Block size for local Hessian computation (default: 16).
debug (bool) – If True, retain the per-quantizer Hessian accumulators on the model (model._local_hessian_accumulators) for inspection.
shared_states (Mapping[str, Mapping[str, Sequence[str]]] | None)

See LocalHessianCalibConfig for details on the configuration options.

max_calibrate(model, forward_loop=None, distributed_sync=True, sync_expert_weight_amax=False, shared_states=None)

Calibrate the model using max.

Parameters:

model (Module) – Model to be calibrated.
forward_loop (Callable[[Module], None] | None) – A callable which takes the model as argument and forwards calibration data through the model.
distributed_sync – Whether to sync input_quantizer amax across distributed processes.
sync_expert_weight_amax – SequentialMLP only — share one weight amax across all experts in a MoE layer (within-rank sync + EP all-reduce when EP>1).
shared_states (Mapping[str, Mapping[str, Sequence[str]]] | None) – Optional dict keyed by shared-state name. "weight_global_amax" is implemented today and accepts {"patterns": [...]}; omitted patterns use SHARED_PATTERNS, while an empty list disables the state.

See MaxCalibConfig for details on the remaining arguments.

smoothquant(model, forward_loop=None, alpha=1.0)

Smooth-Quant variant with per-channel weight scaling.

Parameters:

model (Module) – Model to be calibrated.
forward_loop (Callable[[Module], None] | None) – A callable which takes the model as argument and forwards calibration data through the model.

See SmoothQuantCalibConfig for details on the remaining arguments.

svdquant(model, forward_loop=None, lowrank=32, **kwargs)

Lite version of SVDQuant.

Parameters:

model (Module) – Model to be calibrated.
forward_loop (Callable[[Module], None] | None) – A callable which takes the model as argument and forwards calibration data through the model.
lowrank (int)

See SVDQuantConfig for details on the remaining arguments.