calc_subblock_params_and_memory
Calculate memory usage and parameter counts for neural network subblocks.
This module provides utilities to compute memory footprints and parameter counts for different subblock types (FFN, Attention, Mamba, MoE) in large language models, considering various data types, batch sizes, and sequence lengths.
Functions
|
|
Count parameters on one meta decoder layer. |
|
- calc_subblock_active_params(sublayer_config, model_config, descriptor, n_embd, moe_stats_file, batch_size, block_idx)
- Parameters:
sublayer_config (FFNConfig | AttentionConfig)
model_config (PreTrainedConfig)
descriptor (Type[ModelDescriptor])
n_embd (int)
moe_stats_file (str)
batch_size (int)
block_idx (int)
- Return type:
int
- calculate_ffn_memory(ffn_config, model_config, descriptor, weights_dtype, experts_dtype=None)
- Parameters:
ffn_config (FFNConfig)
model_config (PreTrainedConfig)
descriptor (Type[ModelDescriptor])
weights_dtype (dtype | str)
experts_dtype (dtype | str | None)
- Return type:
float
- calculate_mamba_memory(attention_config, model_config, descriptor, batch_size, weights_dtype, kv_cache_dtype)
- Parameters:
attention_config (AttentionConfig)
model_config (PreTrainedConfig)
descriptor (Type[ModelDescriptor])
batch_size (int)
weights_dtype (dtype)
kv_cache_dtype (dtype)
- Return type:
int
- calculate_mamba_state_size(mamba_config, batch_size)
- Parameters:
mamba_config (MambaConfig)
batch_size (int)
- Return type:
int
- calculate_non_block_memory(n_embd, vocab_size, weight_dtype)
- Parameters:
n_embd (int)
vocab_size (int)
weight_dtype (dtype)
- Return type:
float
- calculate_non_block_params(n_embd, vocab_size)
- Parameters:
n_embd (int)
vocab_size (int)
- Return type:
int
- calculate_subblock_memory(subblock_config, batch_size, prefill_seq_len, generation_seq_len, prefill_queue_size, n_embd, n_head, weights_dtype, kv_cache_dtype, allocate_prefill_query, model_config, descriptor)
model_config/descriptorare required (puzzletron-style); FFN uses them for meta init.- Parameters:
subblock_config (FFNConfig | AttentionConfig)
batch_size (int)
prefill_seq_len (int)
generation_seq_len (int)
prefill_queue_size (int)
n_embd (int)
n_head (int)
weights_dtype (dtype)
kv_cache_dtype (dtype)
allocate_prefill_query (bool)
model_config (PreTrainedConfig)
descriptor (Type[ModelDescriptor])
- Return type:
float | dict[str, float]
- calculate_subblock_params(config, layer_config, descriptor)
Count parameters on one meta decoder layer.
The caller is responsible for adjusting per-layer config fields (e.g.
hybrid_override_pattern) before passingconfig; seeModelDescriptor.truncate_pattern_for_subblock.- Parameters:
config (PreTrainedConfig)
layer_config (BlockConfig | FFNConfig | AttentionConfig)
descriptor (Type[ModelDescriptor])
- Return type:
int
- estimate_num_active_experts(dist_over_experts, batch_size, num_experts)
- Parameters:
dist_over_experts (ndarray)
batch_size (int)
num_experts (int)
- Return type:
int
- load_moe_stats(stats_file)
- Parameters:
stats_file (str)
- Return type:
dict