algorithms

Module for advanced quantization algorithms.

Classes

AutoQuantizeSearcher

A searcher for AutoQuantize algorithm.

QuantRecipe

A subclass of QuantizeConfig enabling auto_quantize specific configurations.

QuantRecipeHparam

An Hparam for quantization recipes.

class AutoQuantizeSearcher

Bases: BaseSearcher

A searcher for AutoQuantize algorithm.

In AutoQuantize, we search for the best per-layer quantization configuration that minimizes the sum of per-layer scores while meeting the specified constraint. AutoQuantize uses Linear Programming Solver to find the optimal quantization configuration.

The auto_quantize score for a layer quantization configuration is an approximation of model loss change change due to quantizing the particular layer with the particular configuration. The approximation is based on taylor expansion of the loss function wrt to the quantized output of the layer and substitution of Fisher information for Hessian. This approximation is mathematically correct for models where the loss is a log likelihood loss such as BERT, GPT, etc. However, the auto_quantize score can still be used as a proxy for other models such as ResNet.

Prepare the model for search by calibrating the quantizers and collecting AutoQuantize score.

best: Dict[str, Any]
candidate_stats: Dict[str, Dict[str, List[float]]]
property default_search_config

Get the default config for the searcher.

property default_state_dict: Dict[str, Any]

Get the default state dict for AutoQuantize.

gradient_checkpointing_enable_contexts: List[Tuple[Callable, Callable]] = [(<function _is_supported_hf_model>, <function setup_model_for_gradient_checkpointing>)]
classmethod insert_hparams_after_merge_rules(model, quant_recipes)

Restrict the search space using the merge rules and insert the hparams for the model.

classmethod register_gradient_checkpointing_enable_context(is_supported_checker, context)

Register a gradient checkpointing enable context for AutoQuantize score estimation.

If the is_supported_checker(model) returns True, the context(model) will be used to enable gradient checkpointing.

Parameters:
  • is_supported_checker (Callable) –

  • context (Callable) –

rules = ['^(.*?)\\.(q_proj|k_proj|v_proj)$', '^(.*?)\\.(gate_proj|up_proj)$', '^(.*?)\\.(\\d+\\.(w1|w2|w3))$', '^(.*?)\\.((w1_linear|w2_linear|w3_linear)\\.\\d+)$']

Search for the best per-layer quantization configuration and return the best model and configuration.

AutoQuantize uses Linear Programming Solver to find the optimal quantization configuration which minimizes the sum of per-layer auto_quantize scores while meeting the specified constraint.

sanitize_search_config(config)

Sanitize the search config dict.

Parameters:

config (Dict[str, Any] | None) –

Return type:

Dict[str, Any]

class QuantRecipe

Bases: CustomHPType

A subclass of QuantizeConfig enabling auto_quantize specific configurations.

__init__(name=None)

Initialize the QuantRecipe with the name of the quantization format.

Parameters:

name (str | None) –

property compression: float

Get the compression factor for the quantization format.

property config: QuantizeConfig

Get the quantization configuration for the quantization format.

static disable_folding_pqs_to_weights()

Disable the folding of pre_quant_scale to weights.

static fold_pqs_to_weights(model)

Fold the pre_quant_scale in weight_quantizers to weights.

property num_bits: int

Get the number of bits for the quantization format.

class QuantRecipeHparam

Bases: Hparam

An Hparam for quantization recipes.

In addition, this Hparam also: 1. Keeps a link to its modules and sets the quantizers for the module based on the active recipe. 2. Keeps track of the importance of each recipe in a dict instead of a tensor

__init__(choices, original=None, nn_modules=None)

Initializes Hparam with original value and choices.

Parameters:
Return type:

None

property active: Tuple[int, ...] | int | float | CustomHPType

Return the currently active value.

property importance: Dict

Return the importance dict mapping recipe and importance.