model_quant

User-facing quantization API.

Functions

quantize

Quantizes and calibrates the model.

disable_quantizer

Disable quantizer by wildcard or filter function.

enable_quantizer

Enable quantizer by wildcard or filter function.

print_quant_summary

Print summary of all quantizer modules in the model.

fold_weight

Fold weight quantizer for fast evaluation.

disable_quantizer(model, wildcard_or_filter_func)

Disable quantizer by wildcard or filter function.

Parameters:
  • model (Module) –

  • wildcard_or_filter_func (str | Callable) –

enable_quantizer(model, wildcard_or_filter_func)

Enable quantizer by wildcard or filter function.

Parameters:
  • model (Module) –

  • wildcard_or_filter_func (str | Callable) –

fold_weight(model)

Fold weight quantizer for fast evaluation.

Parameters:

model (Module) –

print_quant_summary(model)

Print summary of all quantizer modules in the model.

Parameters:

model (Module) –

quantize(model, config, forward_loop=None)

Quantizes and calibrates the model.

This method performs replacement of modules with their quantized counterparts and performs calibration as specified by quant_cfg. forward_loop is used to forward data through the model and gather statistics for calibration.

Parameters:
  • model (Module) – A pytorch model

  • config (Dict[str, Any]) –

    A dictionary specifying the values for keys "quant_cfg" and "algorithm". The "quant_cfg" key specifies the quantization configurations. The "algorithm" key specifies the algorithm argument to calibrate.

    Quantization configurations is a dictionary mapping wildcards or filter functions to its quantizer attributes. The wildcards or filter functions are matched against the quantizer module names. The quantizer modules have names ending with weight_quantizer and input_quantizer and they perform weight quantization and input quantization (or activation quantization) respectively. The quantizer modules are instances of TensorQuantizer and the specified quantizer attributes describe its quantization behavior. See set_quantizer_by_cfg for more details on "quant_cfg" dictionary.

    An example config dictionary is given below:

    Please see config for more examples.

  • forward_loop (Callable[[Module], None] | None) –

    A callable that forwards all calibration data through the model. This is used to gather statistics for calibration. It should take model as the argument. It does not need to return anything. Here are a few examples for correct forward_loop definitions: Example 1:

    def forward_loop(model) -> None:
        # iterate over the data loader and forward data through the model
        for batch in data_loader:
            model(batch)
    

    Example 2:

    def forward_loop(model) -> float:
        # evaluate the model on the task
        return evaluate(model, task, ....)
    

    Example 3:

    def forward_loop(model) -> None:
        # run evaluation pipeline
        evaluator.model = model
        evaluator.evaluate()
    

    Note

    Calibration does not require forwarding the entire dataset through the model. Please subsample the dataset or reduce the number of batches if needed.

Return type:

Module

Returns: A pytorch model which has been quantized and calibrated.