qwen2_model_descriptor
Qwen2 model descriptor for AnyModel compression.
Classes
Model descriptor for Qwen2 models. |
|
Layer descriptor for Qwen2 FFN intermediate pruning. |
- class Qwen2FFNIntermediateLayerDescriptor
Bases:
LlamaFFNIntermediateLayerDescriptorLayer descriptor for Qwen2 FFN intermediate pruning.
Qwen2 uses the same FFN structure as Llama (gate_proj, up_proj, down_proj).
- __init__(down_proj_name='mlp.down_proj', ffn_prefix_name='model.layers.{layer_idx}.mlp', linear_weight_names=<factory>)
- Parameters:
down_proj_name (str)
ffn_prefix_name (str)
linear_weight_names (List[str])
- Return type:
None
- class Qwen2ModelDescriptor
Bases:
ModelDescriptorModel descriptor for Qwen2 models.
- static attn_no_op_post_init(decoder_layer)
- Parameters:
decoder_layer (Module)
- static block_config_to_layer_overrides(block_config)
- Parameters:
block_config (BlockConfig)
- classmethod create_dummy_block(original_layer, block_index)
Create a dummy block that preserves Qwen2-specific attributes like attention_type.
Qwen2’s forward pass accesses decoder_layer.attention_type for attention mask selection.
- Parameters:
original_layer (Module)
block_index (int)
- Return type:
Module
- static decoder_layer_cls()
- static final_norm_name()
- static init_rotary_embedding(model, runtime)
- Parameters:
model (Module)
- static input_embedding_name()
- static layer_block_name(index)
- Parameters:
index (int)
- static layer_name_predicates(num_layers)
- Parameters:
num_layers (int)
- Return type:
Dict[str, Pattern]
- static mlp_no_op_post_init(decoder_layer)
- Parameters:
decoder_layer (Module)
- static output_embedding_name()