runtime_utils
Utilities for runtime benchmarking and model saving in ModelOpt NAS.
This module provides classes and utility functions used for empirical runtime estimation of Transformer subblocks and for saving models and tokenizers in formats suitable for benchmarking (e.g., vLLM latency benchmark) or the AnyModel subblock-safetensors format. It defines the configuration dataclass used to parameterize runtime benchmarks, as well as model checkpointing helpers to ensure compatibility with downstream evaluation pipelines.
Classes
Configuration for a vLLM latency benchmark run. |
Functions
Save model weights as AnyModel and copy the tokenizer to |
|
Save a model checkpoint in AnyModel subblock-safetensors format. |
- class RuntimeConfig
Bases:
objectConfiguration for a vLLM latency benchmark run.
- __init__(vocab_size, hidden_size, num_attention_heads, num_key_value_heads, tokenizer_path, repeat_block_n_times, prefill_seq_len, generation_seq_len, batch_size, num_iters, num_warmup_iters)
- Parameters:
vocab_size (int)
hidden_size (int)
num_attention_heads (int)
num_key_value_heads (int)
tokenizer_path (str)
repeat_block_n_times (int)
prefill_seq_len (int)
generation_seq_len (int)
batch_size (int)
num_iters (int)
num_warmup_iters (int)
- Return type:
None
- batch_size: int
- generation_seq_len: int
- num_attention_heads: int
- num_iters: int
- num_key_value_heads: int
- num_warmup_iters: int
- prefill_seq_len: int
- repeat_block_n_times: int
- tokenizer_path: str
- vocab_size: int
- save_model(model, tokenizer_path, output_path)
Save model weights as AnyModel and copy the tokenizer to
output_path.- Parameters:
model (LlamaForCausalLM)
tokenizer_path (Path)
output_path (Path)
- Return type:
None
- save_model_as_anymodel(model, output_dir, descriptor)
Save a model checkpoint in AnyModel subblock-safetensors format.
- Parameters:
output_dir (Path)