scaling_factor_utils
Utils for scaling factors adjustments.
Functions
Adjusts the amax values for the attention layers. |
|
Convert _amax keys in a quantized state dictionary to scale values and update the state dictionary accordingly. |
|
Calculate the weight scaling facotrs for a given group size. |
|
Resmooths weights from a single or multiple ranks and get scaling factors and amax. |
- adjust_attn_amax_values(module)
Adjusts the amax values for the attention layers.
- convert_state_dict_amax_to_scales(quantized_state_dict, maxbound, layers_quant_config)
Convert _amax keys in a quantized state dictionary to scale values and update the state dictionary accordingly.
- Parameters:
quantized_state_dict (dict) – The input state dictionary with quantized values.
maxbound (float) – The maximum bound value for the given quantization format.
layers_quant_config (dict/str) – Dictionary containing per layer quantization format information for
quantization. (mixed_precision and str containing quantization format for regular) –
- Returns:
The updated state dictionary with converted scale values.
- Return type:
dict
- get_weights_scaling_factor_and_amax(weight, group_size)
Calculate the weight scaling facotrs for a given group size.
- resmooth_and_get_scale_and_amax(merged_weights, pre_quant_scales, ranks, group_size, avg_pre_quant_scale=None)
Resmooths weights from a single or multiple ranks and get scaling factors and amax.
- Parameters:
merged_weights (Tensor) – Merged weights from ranks.
pre_quant_scales (List[Tensor]) – List of pre-quantization scales for each rank.
ranks (int) – Number of ranks.
group_size (int) – Group size of the quantization block.
avg_pre_quant_scale (optional) – If not provided, weights will be resmoothed using the average of pre_quant_scales.
- Returns:
Resmoothed weights. weight_scaling_factors: Resmoothed scaling factors. avg_pre_quant_scale: Calculated average of the quantization scale. amaxes: Amax values for the weights.
- Return type:
weights