expert_removal_pruning_mixin

Classes

ExpertRemovalLayerDescriptor

Descriptor for expert-removal pruning layers.

ExpertRemovalPruningMixIn

class ExpertRemovalLayerDescriptor

Bases: LayerDescriptor

Descriptor for expert-removal pruning layers.

__init__(target_name, moe_prefix_name, expert_prefix_name='', router_weights=<factory>, router_biases=<factory>, expert_weights=<factory>, expert_biases=<factory>, is_fused_experts=False, fused_expert_weights=<factory>)
Parameters:
  • target_name (str)

  • moe_prefix_name (str)

  • expert_prefix_name (str)

  • router_weights (List[str])

  • router_biases (List[str])

  • expert_weights (List[str])

  • expert_biases (List[str])

  • is_fused_experts (bool)

  • fused_expert_weights (List[str])

Return type:

None

expert_biases: List[str]

Per-expert bias names relative to expert_prefix (per-expert format).

expert_prefix(layer_idx, expert_idx)
Parameters:
  • layer_idx (int)

  • expert_idx (int)

Return type:

str

expert_prefix_name: str = ''

Expert prefix relative to moe_prefix with {expert_idx} placeholder, e.g. experts.{expert_idx}.

expert_weights: List[str]

Per-expert weight names relative to expert_prefix (per-expert format).

fused_expert_weights: List[str]

Fused expert weight names relative to moe_prefix, e.g. ["experts.gate_up_proj", "experts.down_proj"].

is_fused_experts: bool = False

If True, experts are stored as single fused tensors (shape [num_experts, ...]).

module_name_regex()
Return type:

str

moe_prefix(layer_idx)
Parameters:

layer_idx (int)

Return type:

str

moe_prefix_name: str

MoE prefix layer name with {layer_idx} placeholder, e.g. model.layers.{layer_idx}.moe.

router_biases: List[str]

Router bias names relative to moe_prefix.

router_weights: List[str]

Router weight names relative to moe_prefix.

target_name: str

Module name for hook registration; supports regex: prefix.

class ExpertRemovalPruningMixIn

Bases: PruningMixIn

__init__(layer_descriptor)
Parameters:

layer_descriptor (ExpertRemovalLayerDescriptor)

prune_single_layer(layer_idx, parent_state_dict, new_state_dict, original_config, new_config, mlp_init_mode, mlp_init_config, keys, **kwargs)
Parameters:
  • layer_idx (int)

  • parent_state_dict (dict)

  • new_state_dict (dict)

  • original_config (PreTrainedConfig)

  • new_config (PreTrainedConfig)

  • mlp_init_mode (MlpInitMode)

  • mlp_init_config (dict[str, Any] | None)

  • keys (dict)

Return type:

Dict[str, Tensor]

supported_hooks()
Return type:

List[Type[ForwardHook]]