layer_utils

Utils for model_config export.

Some of the logics in this file are empirical and needs constant update if exceptions occur.

Functions

`build_attention_config`	Builds the attention config from the module.
`build_conv_config`	Builds the conv config for this module.
`build_decoder_config`	Builds the full decoder config from the module.
`build_embedding_config`	Builds the embedding config from the module.
`build_fused_linear_config`	Returns a fused linear config from a list of modules.
`build_layernorm_config`	Builds the layernorm config from the module.
`build_linear_config`	Builds the linear config for the module.
`build_medusa_heads_config`	Build a list of MedusaHeadConfig if exists.
`build_mlp_config`	Builds the MLP config for the module.
`build_moe_config`	Builds the MOE config for the module.
`build_qkv`	Converts the qkv modules to the config.
`build_recurrent_config`	Builds the recurrent config for this module.
`build_stacked_experts`	Builds the experts_weight_1 and experts_weight_2 configs for the experts.
`check_model_compatibility`	Returns whether the list of modules is compatible with the export logic.
`dup_kv_weight`	Repeat kv heads if tp_size > num_kv_heads.
`get_dtype`	Returns the default dtype of the model.
`get_enc_dec_models`	Get the correct encoder, decoder from hf model.
`get_enc_dec_token_ids`	Parse decoder model token info.
`get_encoder_config`	Get the encoder information for decoder model in enc-dec model.
`get_expert_linear_names`	Get the list of linear names for the experts.
`get_experts_linear_names`	Returns linear layer names based on decoder type for MoE models.
`get_experts_list`	Returns list of grouped experts by linear name for given module.
`get_transformer_layers`	Returns the root module of the transformer model.
`is_attention`	Returns whether the module is an attention layer.
`is_conv`	Returns whether the module is a convolutional layer.
`is_decoder_list`	Returns whether the module is a decoder list.
`is_embedding`	Returns whether the module is an embedding layer.
`is_layernorm`	Returns whether the module is a layernorm layer.
`is_linear`	Returns whether the module is a linear layer.
`is_mlp`	Returns whether the module is an MLP layer.
`is_moe`	Returns whether the module is an MOE layer.
`is_quantlinear`	Returns whether the module is a quantized linear layer.
`is_recurrent`	Returns whether the module is a recurrent layer.
`model_type_is_enc_dec`	Check if model_type is a enc-dec model.
`set_expert_quantizer_amax`	Set amax values for expert quantizers using smart fallback logic.
`update_experts_avg_prequant_scale`	Registers experts_pre_quant_scale attribute of each expert with average pre_quant_scale amongst experts.

build_attention_config(module, model_metadata_config, ext_config=None, tp_size=1)

Builds the attention config from the module.

Parameters:

module (Module)
ext_config (DecoderLayerConfig)
tp_size (int)

Return type:

AttentionConfig

build_conv_config(module)

Builds the conv config for this module.

Parameters:: module (Module)
Return type:: ConvConfig

build_decoder_config(module, model_metadata_config, decoder_type, tp_size=1)

Builds the full decoder config from the module.

Parameters:

module (Module)
decoder_type (str)
tp_size (int)

Return type:

DecoderLayerConfig

build_embedding_config(module, normalization_constant=1)

Builds the embedding config from the module.

Parameters:

module (Module)
normalization_constant (float)

Return type:

EmbeddingConfig

build_fused_linear_config(modules, linear_type)

Returns a fused linear config from a list of modules.

Parameters:

modules (list[Module])
linear_type (str)

Return type:

LinearConfig

build_layernorm_config(module)

Builds the layernorm config from the module.

Parameters:: module (Module)
Return type:: LayernormConfig

build_linear_config(module, linear_type)

Builds the linear config for the module.

Parameters:

module (Module)
linear_type (str)

Return type:

LinearConfig

build_medusa_heads_config(model)

Build a list of MedusaHeadConfig if exists.

Following TensorRT-LLM’s Medusa implementation, all Medusa heads (num_medusa_heads) should be placed inside a ‘torch.nn.ModuleList’ with attribute name ‘medsua_heads’. A Medusa head composes an additional ‘lm_head’ (vocab_size, hidden_size) and a list (num_medusa_layers) of Medusa layer (LinearActConfig). The only supported hidden_act for the layer is ‘silu’. All Linear layers are column-parallel.

Parameters:: model (Module | None)
Return type:: list[MedusaHeadConfig] | None

build_mlp_config(module, decoder_type, hidden_act=None, merge_gate_fc=False)

Builds the MLP config for the module.

Parameters:

module (Module)
hidden_act (str | None)
merge_gate_fc (bool)

Return type:

MLPConfig

build_moe_config(module, decoder_type)

Builds the MOE config for the module.

Parameters:: module (Module)
Return type:: MOEConfig

build_qkv(qkv_modules, model_metadata_config, ext_config=None, tp_size=1)

Converts the qkv modules to the config.

Parameters:

qkv_modules (list[Module])
ext_config (DecoderLayerConfig)
tp_size (int)

Return type:

QKVConfig

build_recurrent_config(module)

Builds the recurrent config for this module.

Parameters:: module (Module)

build_stacked_experts(experts, linear_names, num_experts, expert_getter)

Builds the experts_weight_1 and experts_weight_2 configs for the experts.

Parameters:

experts (Module)
linear_names (list[str])

check_model_compatibility(module_list)

Returns whether the list of modules is compatible with the export logic.

And if positional embedding and embedding layernorm exists.

We assumes the model to be assembled with one or two embedding layers, a ModuleList of transformer decoders, and a final layernorm with optional embedding layernorm. Otherwise it will not be supported.

Parameters:: module_list (list[Module])
Return type:: tuple[bool, bool, bool]

dup_kv_weight(v, head_size, num_head, tp_size)

Repeat kv heads if tp_size > num_kv_heads.

Parameters:

v (Tensor)
head_size (int)
num_head (int)
tp_size (int)

Return type:

Tensor

get_dtype(model): Returns the default dtype of the model.

get_enc_dec_models(hf_model, model_type): Get the correct encoder, decoder from hf model.

get_enc_dec_token_ids(decoder_config): Parse decoder model token info.

get_encoder_config(encoder_config): Get the encoder information for decoder model in enc-dec model.

get_expert_linear_names(module)

Get the list of linear names for the experts.

Parameters:: module (Module)
Return type:: list[str]

get_experts_linear_names(model)

Returns linear layer names based on decoder type for MoE models.

Parameters:: model (Module)

get_experts_list(module, model_type)

Returns list of grouped experts by linear name for given module.

Parameters:

module (Module)
model_type (str)

get_transformer_layers(model)

Returns the root module of the transformer model.

Parameters:: model (Module)
Return type:: list[Module]

is_attention(module)

Returns whether the module is an attention layer.

Parameters:: module (Module)
Return type:: bool

is_conv(module)

Returns whether the module is a convolutional layer.

Parameters:: module (Module)
Return type:: bool

is_decoder_list(module)

Returns whether the module is a decoder list.

Parameters:: module (Module)
Return type:: bool

is_embedding(module)

Returns whether the module is an embedding layer.

Parameters:: module (Module)
Return type:: bool

is_layernorm(module)

Returns whether the module is a layernorm layer.

Parameters:: module (Module)
Return type:: bool

is_linear(module)

Returns whether the module is a linear layer.

Parameters:: module (Module)
Return type:: bool

is_mlp(module)

Returns whether the module is an MLP layer.

Parameters:: module (Module)
Return type:: bool

is_moe(module)

Returns whether the module is an MOE layer.

Parameters:: module (Module)
Return type:: bool

is_quantlinear(module)

Returns whether the module is a quantized linear layer.

Parameters:: module (Module)
Return type:: bool

is_recurrent(module)

Returns whether the module is a recurrent layer.

Parameters:: module (Module)
Return type:: bool

model_type_is_enc_dec(model_type): Check if model_type is a enc-dec model.

set_expert_quantizer_amax(modules, quantizer_attrs=None, fallback_value=0.5, device=None)

Set amax values for expert quantizers using smart fallback logic.

Uses smart fallback logic:

Use max from existing quantizers in current batch (best - direct from calibration)
If no existing values found, then: - For weight quantizers: calculate from weight statistics - For input quantizers: use max from other experts, fallback if none found
Use fallback value as last resort

This ensures we always have semantically appropriate amax values for export.

Parameters:

modules (Module | list[Module]) – Single module or list of modules containing quantizers
quantizer_attrs (str | list[str] | None) – Specific quantizer attributes to handle. If None, defaults to [“input_quantizer”] for backward compatibility.
fallback_value (float) – Final fallback value when other methods fail (default: 0.5)
device (device | None) – Target device for tensors (auto-detected if None)

Returns:

a list of uncalibrated experts

Return type:

uncalibrated_modules

update_experts_avg_prequant_scale(experts)

Registers experts_pre_quant_scale attribute of each expert with average pre_quant_scale amongst experts.

Parameters:: experts (Module)