utils
Quantization utilities.
Functions
Convert the quantization axis to the reduce axis. |
|
Context manager enabling the export mode. |
|
Check if a module is quantized. |
|
Check if a module is a quantized column parallel linear module. |
|
Check if a module is a quantized linear module. |
|
Check if a module is a quantized row parallel linear module. |
|
Compute the absolute maximum value of a tensor. |
|
Compute the sum of a tensor along specified axes. |
|
Replace a function with a new one within a context. |
|
Return the representative weight quantizer for |
|
Update the quant_cfg with the kv cache quant_cfg. |
|
Get the weight param attribute names in a converted module, non-recursive. |
- convert_quantization_axis_to_reduce_axis(input, axis)
Convert the quantization axis to the reduce axis.
- Parameters:
input (torch.Tensor) – The input tensor.
axis (int, tuple, list of None) – The quantization axis. None means per-tensor quantization.
- Returns:
The axis to reduce. None suggests all dimensions should be reduced.
- Return type:
list
- export_torch_mode()
Context manager enabling the export mode.
- is_quantized(module)
Check if a module is quantized.
- is_quantized_column_parallel_linear(module)
Check if a module is a quantized column parallel linear module.
- is_quantized_linear(module)
Check if a module is a quantized linear module.
- is_quantized_row_parallel_linear(module)
Check if a module is a quantized row parallel linear module.
- reduce_amax(input, axis=None, keepdims=True, squeeze_scalar=True)
Compute the absolute maximum value of a tensor.
Reduces input_tensor along the dimensions given in axis. Unless keepdims is true, the rank of the tensor is reduced by 1 for each entry in axis. If keepdims is true, the reduced dimensions are retained with length 1.
Note
Gradient computation is disabled as this function is never meant learning reduces amax
- Parameters:
input – Input tensor
axis – The dimensions to reduce. None or int or tuple of ints. If None (the default), reduces all dimensions. Must be in the range [-rank(input_tensor), rank(input_tensor)).
keepdims – A boolean. If true, retains reduced dimensions with length 1. Default True
- Returns:
The reduced tensor.
- reduce_sum(input, axis=None, keepdims=True)
Compute the sum of a tensor along specified axes.
Reduces input_tensor along the dimensions given in axis. Unless keepdims is true, the rank of the tensor is reduced by 1 for each entry in axis. If keepdims is true, the reduced dimensions are retained with length 1.
Note
Gradient computation is disabled as this function is never meant for learning.
- Parameters:
input – Input tensor
axis – The dimensions to reduce. None or int or tuple of ints. If None (the default), reduces all dimensions. Must be in the range [-rank(input_tensor), rank(input_tensor)).
keepdims – A boolean. If true, retains reduced dimensions with length 1. Default True
- Returns:
The reduced tensor.
- replace_function(package, name, new_func, og_func_cache_name=None)
Replace a function with a new one within a context.
- representative_weight_quantizer(module, weight_name='weight')
Return the representative weight quantizer for
weight_nameonmodule.Handles two layouts:
singular
<name>_weight_quantizer— standardnn.Linear/_QuantLinear.plural
<name>_weight_quantizers(nn.ModuleList) — fused-experts modules (_QuantFusedExperts) hold oneTensorQuantizerper expert. Per-expert formats are identical, so the first element is representative.
Returns
Noneif no matching quantizer is found.- Parameters:
module (Module)
weight_name (str)
- update_quant_cfg_with_kv_cache_quant(quant_cfg, kv_cache_quant_cfg)
Update the quant_cfg with the kv cache quant_cfg.
- Parameters:
quant_cfg (dict[str, Any]) – The outer quantization config dict (with
"quant_cfg"and"algorithm"keys).kv_cache_quant_cfg (list[QuantizerCfgEntry]) – A list of
QuantizerCfgEntrydicts for KV cache quantization, typicallysome_kv_cfg["quant_cfg"].
- Returns:
A deep copy of
quant_cfgwith the KV cache entries appended toquant_cfg["quant_cfg"].- Return type:
dict[str, Any]
- weight_attr_names(module)
Get the weight param attribute names in a converted module, non-recursive.
Covers three layouts:
standard
nn.Linear:weight+weight_quantizer.custom per-weight quantizer (e.g.
Llama4TextExpertswithgate_up_proj+gate_up_proj_weight_quantizer).fused-experts
nn.ModuleListquantizers (_QuantFusedExpertswithgate_up_proj+gate_up_proj_weight_quantizersplural list).
- Parameters:
module (Module)
- Return type:
Generator[str, None, None]