calc_subblock_params_and_memory

Calculate memory usage and parameter counts for neural network subblocks.

This module provides utilities to compute memory footprints and parameter counts for different subblock types (FFN, Attention, Mamba, MoE) in large language models, considering various data types, batch sizes, and sequence lengths.

Functions

calculate_subblock_memory

model_config / descriptor are required (puzzletron-style); FFN uses them for meta init.

calculate_subblock_params

Count parameters on one meta decoder layer.

calc_subblock_active_params

load_moe_stats

estimate_num_active_experts

calculate_mamba_memory

calculate_mamba_state_size

calculate_ffn_memory

calculate_non_block_memory

calculate_non_block_params

calc_subblock_active_params(sublayer_config, model_config, descriptor, n_embd, moe_stats_file, batch_size, block_idx)
Parameters:
Return type:

int

calculate_ffn_memory(ffn_config, model_config, descriptor, weights_dtype, experts_dtype=None)
Parameters:
  • ffn_config (FFNConfig)

  • model_config (PreTrainedConfig)

  • descriptor (Type[ModelDescriptor])

  • weights_dtype (dtype | str)

  • experts_dtype (dtype | str | None)

Return type:

float

calculate_mamba_memory(attention_config, model_config, descriptor, batch_size, weights_dtype, kv_cache_dtype)
Parameters:
  • attention_config (AttentionConfig)

  • model_config (PreTrainedConfig)

  • descriptor (Type[ModelDescriptor])

  • batch_size (int)

  • weights_dtype (dtype)

  • kv_cache_dtype (dtype)

Return type:

int

calculate_mamba_state_size(mamba_config, batch_size)
Parameters:
Return type:

int

calculate_non_block_memory(n_embd, vocab_size, weight_dtype)
Parameters:
  • n_embd (int)

  • vocab_size (int)

  • weight_dtype (dtype)

Return type:

float

calculate_non_block_params(n_embd, vocab_size)
Parameters:
  • n_embd (int)

  • vocab_size (int)

Return type:

int

calculate_subblock_memory(subblock_config, batch_size, prefill_seq_len, generation_seq_len, prefill_queue_size, n_embd, n_head, weights_dtype, kv_cache_dtype, allocate_prefill_query, model_config, descriptor)

model_config / descriptor are required (puzzletron-style); FFN uses them for meta init.

Parameters:
  • subblock_config (FFNConfig | AttentionConfig)

  • batch_size (int)

  • prefill_seq_len (int)

  • generation_seq_len (int)

  • prefill_queue_size (int)

  • n_embd (int)

  • n_head (int)

  • weights_dtype (dtype)

  • kv_cache_dtype (dtype)

  • allocate_prefill_query (bool)

  • model_config (PreTrainedConfig)

  • descriptor (Type[ModelDescriptor])

Return type:

float | dict[str, float]

calculate_subblock_params(config, layer_config, descriptor)

Count parameters on one meta decoder layer.

The caller is responsible for adjusting per-layer config fields (e.g. hybrid_override_pattern) before passing config; see ModelDescriptor.truncate_pattern_for_subblock.

Parameters:
Return type:

int

estimate_num_active_experts(dist_over_experts, batch_size, num_experts)
Parameters:
  • dist_over_experts (ndarray)

  • batch_size (int)

  • num_experts (int)

Return type:

int

load_moe_stats(stats_file)
Parameters:

stats_file (str)

Return type:

dict