nemotron_h_model_descriptor
Classes
NemotronHExpertRemovalLayerDescriptor(target_name: str = 'mixer.gate', moe_prefix_name: str = 'backbone.layers.{layer_idx}.mixer', expert_prefix_name: str = 'experts.{expert_idx}', router_weights: List[str] = <factory>, router_biases: List[str] = <factory>, expert_weights: List[str] = <factory>, expert_biases: List[str] = <factory>, is_fused_experts: bool = False, fused_expert_weights: List[str] = <factory>) |
|
- class NemotronHExpertRemovalLayerDescriptor
Bases:
ExpertRemovalLayerDescriptorNemotronHExpertRemovalLayerDescriptor(target_name: str = ‘mixer.gate’, moe_prefix_name: str = ‘backbone.layers.{layer_idx}.mixer’, expert_prefix_name: str = ‘experts.{expert_idx}’, router_weights: List[str] = <factory>, router_biases: List[str] = <factory>, expert_weights: List[str] = <factory>, expert_biases: List[str] = <factory>, is_fused_experts: bool = False, fused_expert_weights: List[str] = <factory>)
- __init__(target_name='mixer.gate', moe_prefix_name='backbone.layers.{layer_idx}.mixer', expert_prefix_name='experts.{expert_idx}', router_weights=<factory>, router_biases=<factory>, expert_weights=<factory>, expert_biases=<factory>, is_fused_experts=False, fused_expert_weights=<factory>)
- Parameters:
target_name (str)
moe_prefix_name (str)
expert_prefix_name (str)
router_weights (List[str])
router_biases (List[str])
expert_weights (List[str])
expert_biases (List[str])
is_fused_experts (bool)
fused_expert_weights (List[str])
- Return type:
None
- expert_prefix_name: str = 'experts.{expert_idx}'
Expert prefix relative to moe_prefix with
{expert_idx}placeholder, e.g.experts.{expert_idx}.
- expert_weights: List[str]
Per-expert weight names relative to expert_prefix (per-expert format).
- get_modules_names_to_hook(model)
- Return type:
List[Tuple[int, str]]
- moe_prefix_name: str = 'backbone.layers.{layer_idx}.mixer'
MoE prefix layer name with
{layer_idx}placeholder, e.g.model.layers.{layer_idx}.moe.
- router_biases: List[str]
Router bias names relative to moe_prefix.
- router_weights: List[str]
Router weight names relative to moe_prefix.
- target_name: str = 'mixer.gate'
Module name for hook registration; supports
regex:prefix.
- class NemotronHModelDescriptor
Bases:
ModelDescriptor- static attn_no_op_post_init(decoder_layer)
- static block_config_to_layer_overrides(block_config)
- Parameters:
block_config (BlockConfig)
- classmethod create_dummy_block(original_layer, block_index)
- Parameters:
original_layer (Module)
block_index (int)
- Return type:
Module
- static decoder_layer_cls()
- static final_norm_name()
- classmethod get_weight_groups(layer_names, num_hidden_layers)
Problem with NemotronH is that norm.weight can be in both block_{i}_ffn and block_{i}_attention. duplicate groups with norm.weight should be removed.
- Parameters:
layer_names (Iterable[str])
num_hidden_layers (int)
- Return type:
Dict[str, List[str]]
- static init_rotary_embedding(model, runtime)
NemotronH has no positional embeddings
- static input_embedding_name()
- static layer_block_name(index)
- Parameters:
index (int)
- static layer_name_predicates(num_layers)
- Parameters:
num_layers (int)
- Return type:
Dict[str, Pattern]
- static mlp_no_op_post_init(decoder_layer)
- static output_embedding_name()
- static pruning_mixins()
- Return type:
Dict[str, PruningMixIn]
- static requires_trust_remote_code()
- Return type:
bool