scaling_factor_utils

Utils for scaling factors adjustments.

Functions

adjust_attn_amax_values

Adjusts the amax values for the attention layers.

convert_state_dict_amax_to_scales

Convert _amax keys in a quantized state dictionary to scale values and update the state dictionary accordingly.

get_weights_scaling_factor_and_amax

Calculate the weight scaling facotrs for a given group size.

resmooth_and_get_scale_and_amax

Resmooths weights from a single or multiple ranks and get scaling factors and amax.

adjust_attn_amax_values(module)

Adjusts the amax values for the attention layers.

convert_state_dict_amax_to_scales(quantized_state_dict, maxbound)

Convert _amax keys in a quantized state dictionary to scale values and update the state dictionary accordingly.

Parameters:
  • quantized_state_dict (dict) – The input state dictionary with quantized values.

  • maxbound (float) – The maximum bound value for the given quantization format.

Returns:

The updated state dictionary with converted scale values.

Return type:

dict

get_weights_scaling_factor_and_amax(weight, group_size)

Calculate the weight scaling facotrs for a given group size.

resmooth_and_get_scale_and_amax(merged_weights, pre_quant_scales, ranks, group_size, avg_pre_quant_scale=None)

Resmooths weights from a single or multiple ranks and get scaling factors and amax.

Parameters:
  • merged_weights (Tensor) – Merged weights from ranks.

  • pre_quant_scales (List[Tensor]) – List of pre-quantization scales for each rank.

  • ranks (int) – Number of ranks.

  • group_size (int) – Group size of the quantization block.

  • avg_pre_quant_scale (optional) – If not provided, weights will be resmoothed using the average of pre_quant_scales.

Returns:

Resmoothed weights. weight_scaling_factors: Resmoothed scaling factors. avg_pre_quant_scale: Calculated average of the quantization scale. amaxes: Amax values for the weights.

Return type:

weights