calc_runtime_stats
Classes
RuntimeConfig(vocab_size: int, hidden_size: int, num_attention_heads: int, master_puzzle_dir: str, tokenizer_path: str, synth_dataset_num_requests: int, repeat_block_n_times: int, prefill_seq_len: int, generation_seq_len: int, batch_size: int, num_iters: int, num_warmup_iters: int) |
Functions
- class RuntimeConfig
Bases:
objectRuntimeConfig(vocab_size: int, hidden_size: int, num_attention_heads: int, master_puzzle_dir: str, tokenizer_path: str, synth_dataset_num_requests: int, repeat_block_n_times: int, prefill_seq_len: int, generation_seq_len: int, batch_size: int, num_iters: int, num_warmup_iters: int)
- __init__(vocab_size, hidden_size, num_attention_heads, master_puzzle_dir, tokenizer_path, synth_dataset_num_requests, repeat_block_n_times, prefill_seq_len, generation_seq_len, batch_size, num_iters, num_warmup_iters)
- Parameters:
vocab_size (int)
hidden_size (int)
num_attention_heads (int)
master_puzzle_dir (str)
tokenizer_path (str)
synth_dataset_num_requests (int)
repeat_block_n_times (int)
prefill_seq_len (int)
generation_seq_len (int)
batch_size (int)
num_iters (int)
num_warmup_iters (int)
- Return type:
None
- batch_size: int
- generation_seq_len: int
- master_puzzle_dir: str
- num_attention_heads: int
- num_iters: int
- num_warmup_iters: int
- prefill_seq_len: int
- repeat_block_n_times: int
- synth_dataset_num_requests: int
- tokenizer_path: str
- vocab_size: int
- calc_no_block_runtime(runtime_config)
- Parameters:
runtime_config (RuntimeConfig)
- Return type:
float
- calc_runtime_for_subblocks(subblock_config_set, runtime_stats_config, vocab_size, hidden_size, num_attention_heads, master_puzzle_dir, tokenizer_path, synth_dataset_num_requests, prefill_seq_len, generation_seq_len)
- Parameters:
subblock_config_set (set[SubblockConfig])
runtime_stats_config (DictConfig)
vocab_size (int)
hidden_size (int)
num_attention_heads (int)
master_puzzle_dir (str)
tokenizer_path (str)
synth_dataset_num_requests (int)
prefill_seq_len (int)
generation_seq_len (int)
- Return type:
tuple[dict[SubblockConfig, float], float]
- calc_subblock_runtime(runtime_config, subblock_config)
- Parameters:
runtime_config (RuntimeConfig)
subblock_config (SubblockConfig)
- Return type:
float
- create_benchmark_model(vocab_size, hidden_size, num_attention_heads, prefill_seq_len, generation_seq_len, block_config, repeat_block_n_times=10)
- Parameters:
vocab_size (int)
hidden_size (int)
num_attention_heads (int)
prefill_seq_len (int)
generation_seq_len (int)
block_config (BlockConfig | None)
repeat_block_n_times (int)
- Return type:
LlamaForCausalLM
- run_vllm_latency_benchmark(model_path, runtime_config)
- Parameters:
model_path (Path)
runtime_config (RuntimeConfig)
- save_model(model, tokenizer_path, output_path, num_hidden_layers)
- Parameters:
model (LlamaForCausalLM)
tokenizer_path (Path)
output_path (Path)
num_hidden_layers (int)
- Return type:
None
- save_model_as_anymodel(model, output_dir, descriptor, num_hidden_layers)
- Parameters:
output_dir (Path)
num_hidden_layers (int)