qwen3_vl_model_descriptor

Classes

Qwen3VLModelDescriptor

Qwen3VLFFNIntermediateLayerDescriptor

Qwen3VLFFNIntermediateLayerDescriptor(down_proj_name: str = 'mlp.down_proj', ffn_prefix_name: str = 'model.language_model.layers.{layer_idx}.mlp', linear_weight_names: List[str] = <factory>)

Qwen3VLKVHeadsLayerDescriptor

Qwen3VLKVHeadsLayerDescriptor(o_proj_name: str = 'self_attn.o_proj', attn_prefix_name: str = 'model.language_model.layers.{layer_idx}.self_attn', qkvo_weight_names: List[str] = <factory>)

Qwen3VLExpertRemovalLayerDescriptor

Qwen3-VL MoE layer descriptor.

class Qwen3VLExpertRemovalLayerDescriptor

Bases: ExpertRemovalLayerDescriptor

Qwen3-VL MoE layer descriptor.

Reference: https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen3_vl_moe/modeling_qwen3_vl_moe.py - Qwen3VLMoeTextSparseMoeBlock: MoE block with .gate (router) and .experts - Qwen3VLMoeTextTopKRouter: Router with .weight (no bias) - Qwen3VLMoeTextExperts: Fused experts with .gate_up_proj and .down_proj tensors

__init__(target_name='mlp', moe_prefix_name='model.language_model.layers.{layer_idx}.mlp', expert_prefix_name='', router_weights=<factory>, router_biases=<factory>, expert_weights=<factory>, expert_biases=<factory>, is_fused_experts=True, fused_expert_weights=<factory>)
Parameters:
  • target_name (str)

  • moe_prefix_name (str)

  • expert_prefix_name (str)

  • router_weights (List[str])

  • router_biases (List[str])

  • expert_weights (List[str])

  • expert_biases (List[str])

  • is_fused_experts (bool)

  • fused_expert_weights (List[str])

Return type:

None

fused_expert_weights: List[str]

Fused expert weight names relative to moe_prefix, e.g. ["experts.gate_up_proj", "experts.down_proj"].

is_fused_experts: bool = True

If True, experts are stored as single fused tensors (shape [num_experts, ...]).

moe_prefix_name: str = 'model.language_model.layers.{layer_idx}.mlp'

MoE prefix layer name with {layer_idx} placeholder, e.g. model.layers.{layer_idx}.moe.

router_biases: List[str]

Router bias names relative to moe_prefix.

router_weights: List[str]

Router weight names relative to moe_prefix.

target_name: str = 'mlp'

Module name for hook registration; supports regex: prefix.

class Qwen3VLFFNIntermediateLayerDescriptor

Bases: FFNIntermediateLayerDescriptor

Qwen3VLFFNIntermediateLayerDescriptor(down_proj_name: str = ‘mlp.down_proj’, ffn_prefix_name: str = ‘model.language_model.layers.{layer_idx}.mlp’, linear_weight_names: List[str] = <factory>)

__init__(down_proj_name='mlp.down_proj', ffn_prefix_name='model.language_model.layers.{layer_idx}.mlp', linear_weight_names=<factory>)
Parameters:
  • down_proj_name (str)

  • ffn_prefix_name (str)

  • linear_weight_names (List[str])

Return type:

None

down_proj_name: str = 'mlp.down_proj'
ffn_prefix_name: str = 'model.language_model.layers.{layer_idx}.mlp'
linear_weight_names: List[str]
class Qwen3VLKVHeadsLayerDescriptor

Bases: KVHeadsLayerDescriptor

Qwen3VLKVHeadsLayerDescriptor(o_proj_name: str = ‘self_attn.o_proj’, attn_prefix_name: str = ‘model.language_model.layers.{layer_idx}.self_attn’, qkvo_weight_names: List[str] = <factory>)

__init__(o_proj_name='self_attn.o_proj', attn_prefix_name='model.language_model.layers.{layer_idx}.self_attn', qkvo_weight_names=<factory>)
Parameters:
  • o_proj_name (str)

  • attn_prefix_name (str)

  • qkvo_weight_names (List[str])

Return type:

None

attn_prefix_name: str = 'model.language_model.layers.{layer_idx}.self_attn'
o_proj_name: str = 'self_attn.o_proj'
qkvo_weight_names: List[str]
class Qwen3VLModelDescriptor

Bases: ModelDescriptor

static attn_no_op_post_init(decoder_layer)
Parameters:

decoder_layer (Qwen3VLMoeTextDecoderLayer)

static block_config_to_layer_overrides(block_config)
Parameters:

block_config (BlockConfig)

static decoder_layer_cls()
static final_norm_name()
static get_language_model_config(config)

Qwen3-VL has nested text_config for language model parameters.

static init_rotary_embedding(model, runtime)
static input_embedding_name()
static layer_block_name(index)
Parameters:

index (int)

static layer_name_predicates(num_layers)
Parameters:

num_layers (int)

Return type:

Dict[str, Pattern]

static mlp_no_op_post_init(decoder_layer)
Parameters:

decoder_layer (Qwen3VLMoeTextDecoderLayer)

static output_embedding_name()
static uses_autocast()

Qwen3-VL MoE has a dtype bug in HuggingFace transformers under torch.autocast: scatter() in MoE routing fails with dtype mismatch. Use native bfloat16 instead. See: https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Instruct (recommended approach)

Return type:

bool