model_quant
User-facing quantization API.
Functions
Quantizes and calibrates the model. |
|
Disable quantizer by wildcard or filter function. |
|
Enable quantizer by wildcard or filter function. |
|
Print summary of all quantizer modules in the model. |
|
Fold weight quantizer for fast evaluation. |
- disable_quantizer(model, wildcard_or_filter_func)
Disable quantizer by wildcard or filter function.
- Parameters:
model (Module) –
wildcard_or_filter_func (str | Callable) –
- enable_quantizer(model, wildcard_or_filter_func)
Enable quantizer by wildcard or filter function.
- Parameters:
model (Module) –
wildcard_or_filter_func (str | Callable) –
- fold_weight(model)
Fold weight quantizer for fast evaluation.
- Parameters:
model (Module) –
- print_quant_summary(model)
Print summary of all quantizer modules in the model.
- Parameters:
model (Module) –
- quantize(model, config, forward_loop=None)
Quantizes and calibrates the model.
This method performs replacement of modules with their quantized counterparts and performs calibration as specified by
quant_cfg
.forward_loop
is used to forward data through the model and gather statistics for calibration.- Parameters:
model (Module) – A pytorch model
config (Dict[str, Any]) –
A dictionary specifying the values for keys
"quant_cfg"
and"algorithm"
. The"quant_cfg"
key specifies the quantization configurations. The"algorithm"
key specifies thealgorithm
argument tocalibrate
.Quantization configurations is a dictionary mapping wildcards or filter functions to its quantizer attributes. The wildcards or filter functions are matched against the quantizer module names. The quantizer modules have names ending with
weight_quantizer
andinput_quantizer
and they perform weight quantization and input quantization (or activation quantization) respectively. The quantizer modules are instances ofTensorQuantizer
and the specified quantizer attributes describe its quantization behavior. Seeset_quantizer_by_cfg
for more details on"quant_cfg"
dictionary.An example
config
dictionary is given below:Please see
config
for more examples.forward_loop (Callable[[Module], None] | None) –
A callable that forwards all calibration data through the model. This is used to gather statistics for calibration. It should take model as the argument. It does not need to return anything. Here are a few examples for correct
forward_loop
definitions: Example 1:def forward_loop(model) -> None: # iterate over the data loader and forward data through the model for batch in data_loader: model(batch)
Example 2:
def forward_loop(model) -> float: # evaluate the model on the task return evaluate(model, task, ....)
Example 3:
def forward_loop(model) -> None: # run evaluation pipeline evaluator.model = model evaluator.evaluate()
Note
Calibration does not require forwarding the entire dataset through the model. Please subsample the dataset or reduce the number of batches if needed.
- Return type:
Module
Returns: A pytorch model which has been quantized and calibrated.