layer_utils
Utils for model_config export.
Some of the logics in this file are empirical and needs constant update if exceptions occur.
Functions
Builds the attention config from the module. |
|
Builds the conv config for this module. |
|
Builds the full decoder config from the module. |
|
Builds the embedding config from the module. |
|
Builds the layernorm config from the module. |
|
Builds the linear config for the module. |
|
Build a list of MedusaHeadConfig if exists. |
|
Builds the MLP config for the module. |
|
Builds the MOE config for the module. |
|
Converts the qkv modules to the config. |
|
Builds the recurrent config for this module. |
|
Builds the experts_weight_1 and experts_weight_2 configs for the experts. |
|
Returns whether the list of modules is compatible with the export logic. |
|
Repeat kv heads if tp_size > num_kv_heads. |
|
Returns the activation scaling factor. |
|
Returns the kv_cache dtype. |
|
Returns the kv_cache scaling factor if output quantizer is set. |
|
Returns the prequant scaling factor. |
|
Get the qkv and average prequant scaling factor for the module. |
|
Gets the quantization string. |
|
Returns scaling factor from the quantizer as torch.Tensor. |
|
Returns the root module of the transformer model. |
|
Returns the weight block size. |
|
Returns the weight scaling factor. |
|
Returns the secondary weight scaling factor. |
|
Returns whether the module is an attention layer. |
|
Returns whether the module is a decoder list. |
|
Returns whether the module is an embedding layer. |
|
Returns whether the module is a layernorm layer. |
|
Returns whether the module is a linear layer. |
|
Returns whether the module is an MLP layer. |
|
Returns whether the module is an MOE layer. |
|
Returns whether the module is a quantized linear layer. |
|
Returns whether the module is a recurrent layer. |
- build_attention_config(module, model_metadata_config, dtype, ext_config=None, tp_size=1)
Builds the attention config from the module.
- Parameters:
module (Module) –
dtype (dtype) –
ext_config (DecoderLayerConfig) –
tp_size (int) –
- Return type:
- build_conv_config(module, dtype)
Builds the conv config for this module.
- Parameters:
module (Module) –
dtype (dtype) –
- Return type:
- build_decoder_config(module, model_metadata_config, decoder_type, dtype, tp_size=1)
Builds the full decoder config from the module.
- Parameters:
module (Module) –
decoder_type (str) –
dtype (dtype) –
tp_size (int) –
- Return type:
- build_embedding_config(module, dtype, normalization_constant=1)
Builds the embedding config from the module.
- Parameters:
module (Module) –
dtype (dtype) –
normalization_constant (float) –
- Return type:
- build_layernorm_config(module, dtype)
Builds the layernorm config from the module.
- Parameters:
module (Module) –
dtype (dtype) –
- Return type:
- build_linear_config(module, linear_type, dtype)
Builds the linear config for the module.
- Parameters:
module (Module) –
linear_type (str) –
dtype (dtype) –
- Return type:
- build_medusa_heads_config(model, dtype)
Build a list of MedusaHeadConfig if exists.
Following TensorRT-LLM’s Medusa implementation, all Medusa heads (num_medusa_heads) should be placed inside a ‘torch.nn.ModuleList’ with attribute name ‘medsua_heads’. A Medusa head composes an additional ‘lm_head’ (vocab_size, hidden_size) and a list (num_medusa_layers) of Medusa layer (LinearActConfig). The only supported hidden_act for the layer is ‘silu’. All Linear layers are column-parallel.
- Parameters:
model (Module | None) –
dtype (dtype) –
- Return type:
List[MedusaHeadConfig] | None
- build_mlp_config(module, decoder_type, dtype)
Builds the MLP config for the module.
- Parameters:
module (Module) –
dtype (dtype) –
- Return type:
- build_moe_config(module, decoder_type, dtype)
Builds the MOE config for the module.
- Parameters:
module (Module) –
dtype (dtype) –
- Return type:
- build_qkv(qkv_modules, model_metadata_config, dtype, ext_config=None, tp_size=1)
Converts the qkv modules to the config.
- Parameters:
qkv_modules (List[Module]) –
dtype (dtype) –
ext_config (DecoderLayerConfig) –
tp_size (int) –
- Return type:
- build_recurrent_config(module, dtype)
Builds the recurrent config for this module.
- Parameters:
module (Module) –
dtype (dtype) –
- build_stacked_experts(experts, dtype, linear_names, num_experts, expert_getter)
Builds the experts_weight_1 and experts_weight_2 configs for the experts.
- Parameters:
experts (Module) –
dtype (dtype) –
linear_names (List[str]) –
- check_model_compatibility(module_list)
Returns whether the list of modules is compatible with the export logic.
And if positional embedding and embedding layernorm exists.
We assumes the model to be assembled with one or two embedding layers, a ModuleList of transformer decoders, and a final layernorm with optional embedding layernorm. Otherwise it will not be supported.
- Parameters:
module_list (List[Module]) –
- Return type:
Tuple[bool, bool, bool]
- dup_kv_weight(v, head_size, num_head, tp_size)
Repeat kv heads if tp_size > num_kv_heads.
- Parameters:
v (Tensor) –
head_size (int) –
num_head (int) –
tp_size (int) –
- Return type:
Tensor
- get_activation_scaling_factor(module)
Returns the activation scaling factor.
- Parameters:
module (Module) –
- Return type:
Tensor
- get_kv_cache_dtype(modules)
Returns the kv_cache dtype.
If num_bits of output_quantizer is (4, 3) then returns FP8; if it is 8, returns int8, otherwise returns None.
- Parameters:
modules (Union[List[nn.Module], nn.Module]) – The module or list of modules to inspect.
- Returns:
The kv_cache dtype.
- Return type:
str
- get_kv_cache_scaling_factor(qkv_modules)
Returns the kv_cache scaling factor if output quantizer is set. Else returns None by default.
- Parameters:
qkv_modules (List[Module]) –
- Return type:
Tensor
- get_prequant_scaling_factor(module, dtype)
Returns the prequant scaling factor.
- Parameters:
module (Module) –
dtype (dtype) –
- Return type:
Tensor
- get_qkv_and_avg_prequant_scale(module, dtype)
Get the qkv and average prequant scaling factor for the module.
- Parameters:
module – The module containing q, k, and v submodules.
dtype – The data type for the scaling factors.
- Returns:
- A tuple containing the average prequant scaling factor and individual
scaling factors for q, k, and v.
- Return type:
tuple
- get_quantization_format(module)
Gets the quantization string.
Gets the quantization string by iterating through the module and its children. The first non-None quantization string is returned.
- Return type:
str | None
- get_scaling_factor(quantizer)
Returns scaling factor from the quantizer as torch.Tensor.
- Parameters:
quantizer (TensorQuantizer) –
- Return type:
Tensor
- get_transformer_layers(model)
Returns the root module of the transformer model.
- Parameters:
model (Module) –
- Return type:
List[Module]
- get_weight_block_size(module)
Returns the weight block size.
- Parameters:
module (Module) –
- Return type:
int
- get_weight_scaling_factor(module)
Returns the weight scaling factor.
- Parameters:
module (Module) –
- Return type:
Tensor
- get_weight_scaling_factor_2(module)
Returns the secondary weight scaling factor.
- Parameters:
module (Module) –
- Return type:
Tensor
- is_attention(module)
Returns whether the module is an attention layer.
- Parameters:
module (Module) –
- Return type:
bool
- is_decoder_list(module)
Returns whether the module is a decoder list.
- Parameters:
module (Module) –
- Return type:
bool
- is_embedding(module)
Returns whether the module is an embedding layer.
- Parameters:
module (Module) –
- Return type:
bool
- is_layernorm(module)
Returns whether the module is a layernorm layer.
- Parameters:
module (Module) –
- Return type:
bool
- is_linear(module)
Returns whether the module is a linear layer.
- Parameters:
module (Module) –
- Return type:
bool
- is_mlp(module)
Returns whether the module is an MLP layer.
- Parameters:
module (Module) –
- Return type:
bool
- is_moe(module)
Returns whether the module is an MOE layer.
- Parameters:
module (Module) –
- Return type:
bool
- is_quantlinear(module)
Returns whether the module is a quantized linear layer.
- Parameters:
module (Module) –
- Return type:
bool
- is_recurrent(module)
Returns whether the module is a recurrent layer.
- Parameters:
module (Module) –
- Return type:
bool