runtime_utils

Utilities for runtime benchmarking and model saving in ModelOpt NAS.

This module provides classes and utility functions used for empirical runtime estimation of Transformer subblocks and for saving models and tokenizers in formats suitable for benchmarking (e.g., vLLM latency benchmark) or the AnyModel subblock-safetensors format. It defines the configuration dataclass used to parameterize runtime benchmarks, as well as model checkpointing helpers to ensure compatibility with downstream evaluation pipelines.

Classes

RuntimeConfig

Configuration for a vLLM latency benchmark run.

Functions

`save_model`	Save model weights as AnyModel and copy the tokenizer to `output_path`.
`save_model_as_anymodel`	Save a model checkpoint in AnyModel subblock-safetensors format.

class RuntimeConfig

Bases: object

Configuration for a vLLM latency benchmark run.

__init__(vocab_size, hidden_size, num_attention_heads, num_key_value_heads, tokenizer_path, repeat_block_n_times, prefill_seq_len, generation_seq_len, batch_size, num_iters, num_warmup_iters)

Parameters:

vocab_size (int)
hidden_size (int)
num_attention_heads (int)
num_key_value_heads (int)
tokenizer_path (str)
repeat_block_n_times (int)
prefill_seq_len (int)
generation_seq_len (int)
batch_size (int)
num_iters (int)
num_warmup_iters (int)

Return type:

None

batch_size: int

generation_seq_len: int

hidden_size: int

num_attention_heads: int

num_iters: int

num_key_value_heads: int

num_warmup_iters: int

prefill_seq_len: int

repeat_block_n_times: int

tokenizer_path: str

vocab_size: int

save_model(model, tokenizer_path, output_path)

Save model weights as AnyModel and copy the tokenizer to output_path.

Parameters:

model (LlamaForCausalLM)
tokenizer_path (Path)
output_path (Path)

Return type:

None

save_model_as_anymodel(model, output_dir, descriptor)

Save a model checkpoint in AnyModel subblock-safetensors format.

Parameters:: output_dir (Path)