model_config_utils

Common utils for the ModelConfig.

Functions

from_quantized_weight

Converts the quantized weight to the target torch_dtype format.

merge_fc1_gate

Merges the qkv fields in model_config from QKVConfig to a single LinearConfig.

merge_qkv

Merges the qkv fields in model_config from QKVConfig to a single LinearConfig.

model_config_from_dict

Load a dict to a ModelConfig instance.

model_config_to_dict

Converts the instance to a python dict.

naive_quantization

Generates a constant scaling factor (1) with target quantization.

pack_linear_weights

Packs the quantized linear weights in the model_config to the quantized format.

pad_weights

Returns the padded weights to tp_size.

restore_model_config

Recursively restores the model_config from json and loads np.ndarray or torch.Tensor weights from weights.

split_config_and_weights

Util function to split the weights or any torch.Tensor in nested config to weights.

to_quantized_weight

Converts the weight to the quantized (packed) format.

from_quantized_weight(weight, weights_scaling_factor, quantization, torch_dtype)

Converts the quantized weight to the target torch_dtype format.

Parameters:
  • weight (Tensor) –

  • weights_scaling_factor (Tensor) –

  • quantization (str) –

merge_fc1_gate(model_config)

Merges the qkv fields in model_config from QKVConfig to a single LinearConfig.

merge_qkv(model_config)

Merges the qkv fields in model_config from QKVConfig to a single LinearConfig.

model_config_from_dict(d)

Load a dict to a ModelConfig instance.

Parameters:

d (dict) –

Return type:

ModelConfig

model_config_to_dict(model_config)

Converts the instance to a python dict.

Parameters:

model_config (ModelConfig) –

Return type:

dict

naive_quantization(config)

Generates a constant scaling factor (1) with target quantization.

This is for debugging and performance measurement only.

Parameters:

config (ModelConfig) –

pack_linear_weights(model_config)

Packs the quantized linear weights in the model_config to the quantized format.

Parameters:

model_config (ModelConfig) –

pad_weights(weights, tp_size)

Returns the padded weights to tp_size.

restore_model_config(model_config, weights)

Recursively restores the model_config from json and loads np.ndarray or torch.Tensor weights from weights.

Parameters:

weights (Dict[str, ndarray | Tensor]) –

split_config_and_weights(config, weights, prefix='transformer')

Util function to split the weights or any torch.Tensor in nested config to weights.

A weight id starts with transformers or lm_head will also be generated to link the original key to the weights dict. The weights in the weights dict are contiguous.

Parameters:
  • weights (Dict[str, tensor]) –

  • prefix (str) –

to_quantized_weight(weight, weights_scaling_factor, quantization)

Converts the weight to the quantized (packed) format.

Parameters:
  • weight (Tensor) –

  • weights_scaling_factor (Tensor) –

  • quantization (str) –