misc

Classes

EmptyInitOnDevice

Functions

calculate_kv_dim

Calculate the key-value dimension for grouped-query attention.

raise_unknown_subblock_config_error

Raise an error for invalid subblock configuration types.

sizeof_dtype

Return the size in bytes of the given data type.

load_json

Load and parse a JSON file.

solution_to_str

Convert a list of block configurations to a human-readable string representation.

block_config_to_str

Convert a BlockConfig to a human-readable string representation.

subblock_config_to_str

Convert a subblock config (FFN, Attention, Mamba, or MoE) to string.

class EmptyInitOnDevice

Bases: TorchFunctionMode

__init__(device=None, dtype=None)

Create tensors with given device and dtype using uninitialized memory.

Parameters:
  • devicetorch.device to work with.

  • dtypetorch.dtype to work with.

Example:

with EmptyInitOnDevice("cuda", dtype=torch.bfloat16):
    model = LLaMA(model_config)
model.load_state_dict(torch.load("llama-lit/7B/lit-llama.pth"))
block_config_to_str(block_config)

Convert a BlockConfig to a human-readable string representation.

TODO: Consider a better place for this function. :param block_config: BlockConfig dataclass or dict containing attention and ffn configs.

Returns:

Formatted string with attention and FFN information, or None if input is None.

Parameters:

block_config (BlockConfig | dict[str, Any] | None)

Return type:

str | None

calculate_kv_dim(num_key_value_heads, n_head, n_embd)

Calculate the key-value dimension for grouped-query attention.

Parameters:
  • num_key_value_heads (int) – Number of key-value heads.

  • n_head (int) – Total number of attention heads.

  • n_embd (int) – Embedding dimension.

Returns:

Combined dimension for key and value tensors (2 * num_key_value_heads * head_size).

Return type:

int

load_json(file_path)

Load and parse a JSON file.

TODO: Consider a better place for this function.

Parameters:

file_path (str) – Path to the JSON file to load.

Returns:

Parsed JSON data as a Python object, or None if the file doesn’t exist.

raise_unknown_subblock_config_error(subblock_config)

Raise an error for invalid subblock configuration types.

TODO: Consider a better place for this function. :param subblock_config: The invalid subblock configuration object.

Raises:

ValueError – Always raised with a message indicating the expected types.

Parameters:

subblock_config (Any)

Return type:

None

sizeof_dtype(dtype)

Return the size in bytes of the given data type.

TODO: Consider a better place for this function. :param dtype: PyTorch data type or custom type string (e.g., ‘nvfp4’).

Returns:

‘nvfp4’ returns ~0.588 bytes.

Return type:

Size in bytes of the data type. Special case

Parameters:

dtype (dtype)

solution_to_str(block_configs)

Convert a list of block configurations to a human-readable string representation.

TODO: Consider a better place for this function. Better place for this and subsequent related function would be in __repr__ function in class BlockConfig so when we print it or do str(block_config), it automatically prints in this custom formatted string

Parameters:

block_configs (list[dict[str, Any] | BlockConfig]) – List of BlockConfig dataclasses or dicts containing layer configurations.

Returns:

Multi-line string with each block’s configuration on a separate line.

Return type:

str

subblock_config_to_str(subblock_config, subblock_name=None)

Convert a subblock config (FFN, Attention, Mamba, or MoE) to string.

Parameters:
  • subblock_config (FFNConfig | AttentionConfig | dict[str, Any] | None) – FFNConfig, AttentionConfig dataclass or dict.

  • subblock_name (None | str) – Name of subblock (‘ffn’, ‘attention’, ‘mamba’, ‘moe’). Auto-detected if subblock_config is a dataclass.

Returns:

Formatted string showing subblock type and key parameters (e.g., intermediate_size, num_key_value_heads), or None if input is None.

Return type:

str | None