qwen3_vl_model_descriptor
Classes
Qwen3VLFFNIntermediateLayerDescriptor(down_proj_name: str = 'mlp.down_proj', ffn_prefix_name: str = 'model.language_model.layers.{layer_idx}.mlp', linear_weight_names: List[str] = <factory>) |
|
Qwen3VLKVHeadsLayerDescriptor(o_proj_name: str = 'self_attn.o_proj', attn_prefix_name: str = 'model.language_model.layers.{layer_idx}.self_attn', qkvo_weight_names: List[str] = <factory>) |
|
Qwen3-VL MoE layer descriptor. |
- class Qwen3VLExpertRemovalLayerDescriptor
Bases:
ExpertRemovalLayerDescriptorQwen3-VL MoE layer descriptor.
Reference: https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen3_vl_moe/modeling_qwen3_vl_moe.py - Qwen3VLMoeTextSparseMoeBlock: MoE block with .gate (router) and .experts - Qwen3VLMoeTextTopKRouter: Router with .weight (no bias) - Qwen3VLMoeTextExperts: Fused experts with .gate_up_proj and .down_proj tensors
- __init__(target_name='mlp', moe_prefix_name='model.language_model.layers.{layer_idx}.mlp', expert_prefix_name='', router_weights=<factory>, router_biases=<factory>, expert_weights=<factory>, expert_biases=<factory>, is_fused_experts=True, fused_expert_weights=<factory>)
- Parameters:
target_name (str)
moe_prefix_name (str)
expert_prefix_name (str)
router_weights (List[str])
router_biases (List[str])
expert_weights (List[str])
expert_biases (List[str])
is_fused_experts (bool)
fused_expert_weights (List[str])
- Return type:
None
- fused_expert_weights: List[str]
Fused expert weight names relative to moe_prefix, e.g.
["experts.gate_up_proj", "experts.down_proj"].
- is_fused_experts: bool = True
If
True, experts are stored as single fused tensors (shape[num_experts, ...]).
- moe_prefix_name: str = 'model.language_model.layers.{layer_idx}.mlp'
MoE prefix layer name with
{layer_idx}placeholder, e.g.model.layers.{layer_idx}.moe.
- router_biases: List[str]
Router bias names relative to moe_prefix.
- router_weights: List[str]
Router weight names relative to moe_prefix.
- target_name: str = 'mlp'
Module name for hook registration; supports
regex:prefix.
- class Qwen3VLFFNIntermediateLayerDescriptor
Bases:
FFNIntermediateLayerDescriptorQwen3VLFFNIntermediateLayerDescriptor(down_proj_name: str = ‘mlp.down_proj’, ffn_prefix_name: str = ‘model.language_model.layers.{layer_idx}.mlp’, linear_weight_names: List[str] = <factory>)
- __init__(down_proj_name='mlp.down_proj', ffn_prefix_name='model.language_model.layers.{layer_idx}.mlp', linear_weight_names=<factory>)
- Parameters:
down_proj_name (str)
ffn_prefix_name (str)
linear_weight_names (List[str])
- Return type:
None
- down_proj_name: str = 'mlp.down_proj'
- ffn_prefix_name: str = 'model.language_model.layers.{layer_idx}.mlp'
- linear_weight_names: List[str]
- class Qwen3VLKVHeadsLayerDescriptor
Bases:
KVHeadsLayerDescriptorQwen3VLKVHeadsLayerDescriptor(o_proj_name: str = ‘self_attn.o_proj’, attn_prefix_name: str = ‘model.language_model.layers.{layer_idx}.self_attn’, qkvo_weight_names: List[str] = <factory>)
- __init__(o_proj_name='self_attn.o_proj', attn_prefix_name='model.language_model.layers.{layer_idx}.self_attn', qkvo_weight_names=<factory>)
- Parameters:
o_proj_name (str)
attn_prefix_name (str)
qkvo_weight_names (List[str])
- Return type:
None
- attn_prefix_name: str = 'model.language_model.layers.{layer_idx}.self_attn'
- o_proj_name: str = 'self_attn.o_proj'
- qkvo_weight_names: List[str]
- class Qwen3VLModelDescriptor
Bases:
ModelDescriptor- static attn_no_op_post_init(decoder_layer)
- Parameters:
decoder_layer (Qwen3VLMoeTextDecoderLayer)
- static block_config_to_layer_overrides(block_config)
- Parameters:
block_config (BlockConfig)
- static decoder_layer_cls()
- static final_norm_name()
- static get_language_model_config(config)
Qwen3-VL has nested text_config for language model parameters.
- static init_rotary_embedding(model, runtime)
- static input_embedding_name()
- static layer_block_name(index)
- Parameters:
index (int)
- static layer_name_predicates(num_layers)
- Parameters:
num_layers (int)
- Return type:
Dict[str, Pattern]
- static mlp_no_op_post_init(decoder_layer)
- Parameters:
decoder_layer (Qwen3VLMoeTextDecoderLayer)
- static output_embedding_name()
- static uses_autocast()
Qwen3-VL MoE has a dtype bug in HuggingFace transformers under torch.autocast: scatter() in MoE routing fails with dtype mismatch. Use native bfloat16 instead. See: https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Instruct (recommended approach)
- Return type:
bool