gpt_oss_model_descriptor

GPT-OSS model descriptor for AnyModel compression.

Classes

GptOssModelDescriptor

Model descriptor for GPT-OSS (pure MoE model).

GptOssExpertRemovalLayerDescriptor

GPT-OSS MoE layer descriptor for expert removal.

class GptOssExpertRemovalLayerDescriptor

Bases: ExpertRemovalLayerDescriptor

GPT-OSS MoE layer descriptor for expert removal.

Note: This only works for unquantized models (e.g., test models). Production GPT-OSS models use MXFP4 quantization with fused experts (_blocks, _scales, _bias), which requires a different approach.

Structure: - Router: mlp.router with .weight and .bias - Experts: mlp.experts.{idx}.{gate_up_proj,down_proj} with .weight and .bias

__init__(target_name='mlp', moe_prefix_name='model.layers.{layer_idx}.mlp', expert_prefix_name='experts', router_weights=<factory>, router_biases=<factory>, expert_weights=<factory>, expert_biases=<factory>, is_fused_experts=True, fused_expert_weights=<factory>)
Parameters:
  • target_name (str)

  • moe_prefix_name (str)

  • expert_prefix_name (str)

  • router_weights (List[str])

  • router_biases (List[str])

  • expert_weights (List[str])

  • expert_biases (List[str])

  • is_fused_experts (bool)

  • fused_expert_weights (List[str])

Return type:

None

expert_biases: List[str]

Per-expert bias names relative to expert_prefix (per-expert format).

expert_prefix_name: str = 'experts'

Expert prefix relative to moe_prefix with {expert_idx} placeholder, e.g. experts.{expert_idx}.

expert_weights: List[str]

Per-expert weight names relative to expert_prefix (per-expert format).

fused_expert_weights: List[str]

Fused expert weight names relative to moe_prefix, e.g. ["experts.gate_up_proj", "experts.down_proj"].

get_modules_names_to_hook(model)
Return type:

List[Tuple[int, str]]

is_fused_experts: bool = True

If True, experts are stored as single fused tensors (shape [num_experts, ...]).

moe_prefix_name: str = 'model.layers.{layer_idx}.mlp'

MoE prefix layer name with {layer_idx} placeholder, e.g. model.layers.{layer_idx}.moe.

router_biases: List[str]

Router bias names relative to moe_prefix.

router_weights: List[str]

Router weight names relative to moe_prefix.

target_name: str = 'mlp'

Module name for hook registration; supports regex: prefix.

class GptOssModelDescriptor

Bases: ModelDescriptor

Model descriptor for GPT-OSS (pure MoE model).

static attn_no_op_post_init(decoder_layer)

Replace attention sublayers with no-op modules.

static block_config_to_layer_overrides(block_config)

Map BlockConfig to layer constructor overrides.

Parameters:

block_config (BlockConfig)

classmethod create_dummy_block(original_layer, block_index)
Parameters:
  • original_layer (GptOssDecoderLayer)

  • block_index (int)

Return type:

Module

static decoder_layer_cls()

Get the decoder layer class for GPT-OSS models.

GPT-OSS is a standard transformers model in recent versions. Import directly from transformers.models.gpt_oss.modeling_gpt_oss.

static final_norm_name()
static init_rotary_embedding(model, runtime)

Initialize rotary embeddings on the correct device.

static input_embedding_name()
static layer_block_name(index)
Parameters:

index (int)

static layer_name_predicates(num_layers)

Define regex patterns for grouping weights into subblocks.

Parameters:

num_layers (int)

Return type:

Dict[str, Pattern]

static mlp_no_op_post_init(decoder_layer)

Replace MLP sublayers with no-op modules.

Note: GPT-OSS MoE layers return (hidden_states, router_scores), so we need to return a tuple of 2 values.

static output_embedding_name()
static pruning_mixins()

Return available pruning mixins for GPT-OSS.

Note: Expert removal works for unquantized models (test models). Production models use MXFP4 quantization which is not yet supported.

Return type:

Dict[str, PruningMixIn]