calc_subblock_stats

Calc subblock stats to compute memory and runtime statistics for subblocks.

Functions

calculate_subblock_stats

launch_calc_subblock_stats

Launch the calc subblock stats function with Hydra configuration.

add_int8_runtime_estimates

add_int8_runtime_estimates(subblock_stats)
Parameters:

subblock_stats (list[dict])

Return type:

None

calculate_subblock_stats(calc_subblock_stats_config, teacher_dir, model_config, descriptor, master_puzzle_dir, subblock_configs, batch_size, prefill_seq_len, generation_seq_len, prefill_queue_size, n_embd, n_head, vocab_size, benchmark_iterations, use_cuda_graph, weights_dtype, activations_dtype, kv_cache_dtype, allocate_prefill_query, moe_stats_file=None)
Parameters:
  • calc_subblock_stats_config (DictConfig)

  • teacher_dir (Path)

  • model_config (PreTrainedConfig)

  • descriptor (Type[ModelDescriptor])

  • master_puzzle_dir (Path)

  • subblock_configs (list[immutabledict[str, AttentionConfig | FFNConfig]])

  • batch_size (int)

  • prefill_seq_len (int)

  • generation_seq_len (int)

  • prefill_queue_size (int)

  • n_embd (int)

  • n_head (int)

  • vocab_size (int)

  • benchmark_iterations (int | None)

  • use_cuda_graph (bool)

  • weights_dtype (dtype)

  • activations_dtype (dtype)

  • kv_cache_dtype (dtype)

  • allocate_prefill_query (bool)

  • moe_stats_file (str | Path | None)

Return type:

dict

launch_calc_subblock_stats(cfg)

Launch the calc subblock stats function with Hydra configuration.

Parameters:

cfg (DictConfig)

Return type:

None