calc_subblock_params_and_memory

Calculate memory usage and parameter counts for neural network subblocks.

This module provides utilities to compute memory footprints and parameter counts for different subblock types (FFN, Attention, Mamba, MoE) in large language models, considering various data types, batch sizes, and sequence lengths.

Functions

`calculate_subblock_memory`	`model_config` / `descriptor` are required (puzzletron-style); FFN uses them for meta init.
`calculate_subblock_params`	Count parameters on one meta decoder layer.
`calc_subblock_active_params`
`load_moe_stats`
`estimate_num_active_experts`
`calculate_mamba_memory`
`calculate_mamba_state_size`
`calculate_ffn_memory`
`calculate_non_block_memory`
`calculate_non_block_params`

calc_subblock_active_params(sublayer_config, model_config, descriptor, n_embd, moe_stats_file, batch_size, block_idx)

Parameters:

sublayer_config (FFNConfig | AttentionConfig)
model_config (PreTrainedConfig)
descriptor (Type[ModelDescriptor])
n_embd (int)
moe_stats_file (str)
batch_size (int)
block_idx (int)

Return type:

int

calculate_ffn_memory(ffn_config, model_config, descriptor, weights_dtype, experts_dtype=None)

Parameters:

ffn_config (FFNConfig)
model_config (PreTrainedConfig)
descriptor (Type[ModelDescriptor])
weights_dtype (dtype | str)
experts_dtype (dtype | str | None)

Return type:

float

calculate_mamba_memory(attention_config, model_config, descriptor, batch_size, weights_dtype, kv_cache_dtype)

Parameters:

attention_config (AttentionConfig)
model_config (PreTrainedConfig)
descriptor (Type[ModelDescriptor])
batch_size (int)
weights_dtype (dtype)
kv_cache_dtype (dtype)

Return type:

int

calculate_mamba_state_size(mamba_config, batch_size)

Parameters:

mamba_config (MambaConfig)
batch_size (int)

Return type:

int

calculate_non_block_memory(n_embd, vocab_size, weight_dtype)

Parameters:

n_embd (int)
vocab_size (int)
weight_dtype (dtype)

Return type:

float

calculate_non_block_params(n_embd, vocab_size)

Parameters:

n_embd (int)
vocab_size (int)

Return type:

int

calculate_subblock_memory(subblock_config, batch_size, prefill_seq_len, generation_seq_len, prefill_queue_size, n_embd, n_head, weights_dtype, kv_cache_dtype, allocate_prefill_query, model_config, descriptor)

model_config / descriptor are required (puzzletron-style); FFN uses them for meta init.

Parameters:

subblock_config (FFNConfig | AttentionConfig)
batch_size (int)
prefill_seq_len (int)
generation_seq_len (int)
prefill_queue_size (int)
n_embd (int)
n_head (int)
weights_dtype (dtype)
kv_cache_dtype (dtype)
allocate_prefill_query (bool)
model_config (PreTrainedConfig)
descriptor (Type[ModelDescriptor])

Return type:

float | dict[str, float]

calculate_subblock_params(config, layer_config, descriptor)

Count parameters on one meta decoder layer.

The caller is responsible for adjusting per-layer config fields (e.g. hybrid_override_pattern) before passing config; see ModelDescriptor.truncate_pattern_for_subblock.

Parameters:

config (PreTrainedConfig)
layer_config (BlockConfig | FFNConfig | AttentionConfig)
descriptor (Type[ModelDescriptor])

Return type:

int

estimate_num_active_experts(dist_over_experts, batch_size, num_experts)

Parameters:

dist_over_experts (ndarray)
batch_size (int)
num_experts (int)

Return type:

int

load_moe_stats(stats_file)

Parameters:: stats_file (str)
Return type:: dict