algorithms
Module for advanced quantization algorithms.
Classes
A searcher for AutoQuantize algorithm. |
|
A subclass of QuantizeConfig enabling auto_quantize specific configurations. |
|
An Hparam for quantization recipes. |
- class AutoQuantizeSearcher
Bases:
BaseSearcher
A searcher for AutoQuantize algorithm.
In AutoQuantize, we search for the best per-layer quantization configuration that minimizes the sum of per-layer scores while meeting the specified constraint. AutoQuantize uses Linear Programming Solver to find the optimal quantization configuration.
The auto_quantize score for a layer quantization configuration is an approximation of model loss change change due to quantizing the particular layer with the particular configuration. The approximation is based on taylor expansion of the loss function wrt to the quantized output of the layer and substitution of Fisher information for Hessian. This approximation is mathematically correct for models where the loss is a log likelihood loss such as BERT, GPT, etc. However, the auto_quantize score can still be used as a proxy for other models such as ResNet.
- before_search()
Prepare the model for search by calibrating the quantizers and collecting
AutoQuantize
score.
- best: Dict[str, Any]
- candidate_stats: Dict[str, Dict[str, List[float]]]
- property default_search_config
Get the default config for the searcher.
- property default_state_dict: Dict[str, Any]
Get the default state dict for AutoQuantize.
- gradient_checkpointing_enable_contexts: List[Tuple[Callable, Callable]] = [(<function _is_supported_hf_model>, <function setup_model_for_gradient_checkpointing>)]
- classmethod insert_hparams_after_merge_rules(model, quant_recipes)
Restrict the search space using the merge rules and insert the hparams for the model.
- classmethod register_gradient_checkpointing_enable_context(is_supported_checker, context)
Register a gradient checkpointing enable context for AutoQuantize score estimation.
If the is_supported_checker(model) returns True, the context(model) will be used to enable gradient checkpointing.
- Parameters:
is_supported_checker (Callable) –
context (Callable) –
- rules = ['^(.*?)\\.(q_proj|k_proj|v_proj)$', '^(.*?)\\.(gate_proj|up_proj)$', '^(.*?)\\.(\\d+\\.(w1|w2|w3))$', '^(.*?)\\.((w1_linear|w2_linear|w3_linear)\\.\\d+)$']
- run_search()
Search for the best per-layer quantization configuration and return the best model and configuration.
AutoQuantize uses Linear Programming Solver to find the optimal quantization configuration which minimizes the sum of per-layer auto_quantize scores while meeting the specified constraint.
- sanitize_search_config(config)
Sanitize the search config dict.
- Parameters:
config (Dict[str, Any] | None) –
- Return type:
Dict[str, Any]
- class QuantRecipe
Bases:
CustomHPType
A subclass of QuantizeConfig enabling auto_quantize specific configurations.
- __init__(name=None)
Initialize the QuantRecipe with the name of the quantization format.
- Parameters:
name (str | None) –
- property compression: float
Get the compression factor for the quantization format.
- property config: QuantizeConfig
Get the quantization configuration for the quantization format.
- static disable_folding_pqs_to_weights()
Disable the folding of pre_quant_scale to weights.
- static fold_pqs_to_weights(model)
Fold the pre_quant_scale in weight_quantizers to weights.
- property num_bits: int
Get the number of bits for the quantization format.
- class QuantRecipeHparam
Bases:
Hparam
An Hparam for quantization recipes.
In addition, this Hparam also: 1. Keeps a link to its modules and sets the quantizers for the module based on the active recipe. 2. Keeps track of the importance of each recipe in a dict instead of a tensor
- __init__(choices, original=None, nn_modules=None)
Initializes Hparam with original value and choices.
- Parameters:
choices (Sequence[QuantRecipe]) –
original (QuantRecipe | None) –
nn_modules (List[Module] | None) –
- Return type:
None
- property active: Tuple[int, ...] | int | float | CustomHPType
Return the currently active value.
- property importance: Dict
Return the importance dict mapping recipe and importance.