gpt_oss_model_descriptor
GPT-OSS model descriptor for AnyModel compression.
Classes
Model descriptor for GPT-OSS (pure MoE model). |
|
GPT-OSS MoE layer descriptor for expert removal. |
- class GptOssExpertRemovalLayerDescriptor
Bases:
ExpertRemovalLayerDescriptorGPT-OSS MoE layer descriptor for expert removal.
Note: This only works for unquantized models (e.g., test models). Production GPT-OSS models use MXFP4 quantization with fused experts (_blocks, _scales, _bias), which requires a different approach.
Structure: - Router: mlp.router with .weight and .bias - Experts: mlp.experts.{idx}.{gate_up_proj,down_proj} with .weight and .bias
- __init__(target_name='mlp', moe_prefix_name='model.layers.{layer_idx}.mlp', expert_prefix_name='experts', router_weights=<factory>, router_biases=<factory>, expert_weights=<factory>, expert_biases=<factory>, is_fused_experts=True, fused_expert_weights=<factory>)
- Parameters:
target_name (str)
moe_prefix_name (str)
expert_prefix_name (str)
router_weights (List[str])
router_biases (List[str])
expert_weights (List[str])
expert_biases (List[str])
is_fused_experts (bool)
fused_expert_weights (List[str])
- Return type:
None
- expert_biases: List[str]
Per-expert bias names relative to expert_prefix (per-expert format).
- expert_prefix_name: str = 'experts'
Expert prefix relative to moe_prefix with
{expert_idx}placeholder, e.g.experts.{expert_idx}.
- expert_weights: List[str]
Per-expert weight names relative to expert_prefix (per-expert format).
- fused_expert_weights: List[str]
Fused expert weight names relative to moe_prefix, e.g.
["experts.gate_up_proj", "experts.down_proj"].
- get_modules_names_to_hook(model)
- Return type:
List[Tuple[int, str]]
- is_fused_experts: bool = True
If
True, experts are stored as single fused tensors (shape[num_experts, ...]).
- moe_prefix_name: str = 'model.layers.{layer_idx}.mlp'
MoE prefix layer name with
{layer_idx}placeholder, e.g.model.layers.{layer_idx}.moe.
- router_biases: List[str]
Router bias names relative to moe_prefix.
- router_weights: List[str]
Router weight names relative to moe_prefix.
- target_name: str = 'mlp'
Module name for hook registration; supports
regex:prefix.
- class GptOssModelDescriptor
Bases:
ModelDescriptorModel descriptor for GPT-OSS (pure MoE model).
- static attn_no_op_post_init(decoder_layer)
Replace attention sublayers with no-op modules.
- static block_config_to_layer_overrides(block_config)
Map BlockConfig to layer constructor overrides.
- Parameters:
block_config (BlockConfig)
- classmethod create_dummy_block(original_layer, block_index)
- Parameters:
original_layer (GptOssDecoderLayer)
block_index (int)
- Return type:
Module
- static decoder_layer_cls()
Get the decoder layer class for GPT-OSS models.
GPT-OSS is a standard transformers model in recent versions. Import directly from transformers.models.gpt_oss.modeling_gpt_oss.
- static final_norm_name()
- static init_rotary_embedding(model, runtime)
Initialize rotary embeddings on the correct device.
- static input_embedding_name()
- static layer_block_name(index)
- Parameters:
index (int)
- static layer_name_predicates(num_layers)
Define regex patterns for grouping weights into subblocks.
- Parameters:
num_layers (int)
- Return type:
Dict[str, Pattern]
- static mlp_no_op_post_init(decoder_layer)
Replace MLP sublayers with no-op modules.
Note: GPT-OSS MoE layers return (hidden_states, router_scores), so we need to return a tuple of 2 values.
- static output_embedding_name()
- static pruning_mixins()
Return available pruning mixins for GPT-OSS.
Note: Expert removal works for unquantized models (test models). Production models use MXFP4 quantization which is not yet supported.
- Return type:
Dict[str, PruningMixIn]