misc
Classes
Functions
Calculate the key-value dimension for grouped-query attention. |
|
Raise an error for invalid subblock configuration types. |
|
Return the size in bytes of the given data type. |
|
Load and parse a JSON file. |
|
Convert a list of block configurations to a human-readable string representation. |
|
Convert a BlockConfig to a human-readable string representation. |
|
Convert a subblock config (FFN, Attention, Mamba, or MoE) to string. |
- class EmptyInitOnDevice
Bases:
TorchFunctionMode- __init__(device=None, dtype=None)
Create tensors with given device and dtype using uninitialized memory.
- Parameters:
device –
torch.deviceto work with.dtype –
torch.dtypeto work with.
Example:
with EmptyInitOnDevice("cuda", dtype=torch.bfloat16): model = LLaMA(model_config) model.load_state_dict(torch.load("llama-lit/7B/lit-llama.pth"))
- block_config_to_str(block_config)
Convert a BlockConfig to a human-readable string representation.
TODO: Consider a better place for this function. :param block_config: BlockConfig dataclass or dict containing attention and ffn configs.
- Returns:
Formatted string with attention and FFN information, or None if input is None.
- Parameters:
block_config (BlockConfig | dict[str, Any] | None)
- Return type:
str | None
- calculate_kv_dim(num_key_value_heads, n_head, n_embd)
Calculate the key-value dimension for grouped-query attention.
- Parameters:
num_key_value_heads (int) – Number of key-value heads.
n_head (int) – Total number of attention heads.
n_embd (int) – Embedding dimension.
- Returns:
Combined dimension for key and value tensors (2 * num_key_value_heads * head_size).
- Return type:
int
- load_json(file_path)
Load and parse a JSON file.
TODO: Consider a better place for this function.
- Parameters:
file_path (str) – Path to the JSON file to load.
- Returns:
Parsed JSON data as a Python object, or None if the file doesn’t exist.
- raise_unknown_subblock_config_error(subblock_config)
Raise an error for invalid subblock configuration types.
TODO: Consider a better place for this function. :param subblock_config: The invalid subblock configuration object.
- Raises:
ValueError – Always raised with a message indicating the expected types.
- Parameters:
subblock_config (Any)
- Return type:
None
- sizeof_dtype(dtype)
Return the size in bytes of the given data type.
TODO: Consider a better place for this function. :param dtype: PyTorch data type or custom type string (e.g., ‘nvfp4’).
- Returns:
‘nvfp4’ returns ~0.588 bytes.
- Return type:
Size in bytes of the data type. Special case
- Parameters:
dtype (dtype)
- solution_to_str(block_configs)
Convert a list of block configurations to a human-readable string representation.
TODO: Consider a better place for this function. Better place for this and subsequent related function would be in __repr__ function in class BlockConfig so when we print it or do str(block_config), it automatically prints in this custom formatted string
- Parameters:
block_configs (list[dict[str, Any] | BlockConfig]) – List of BlockConfig dataclasses or dicts containing layer configurations.
- Returns:
Multi-line string with each block’s configuration on a separate line.
- Return type:
str
- subblock_config_to_str(subblock_config, subblock_name=None)
Convert a subblock config (FFN, Attention, Mamba, or MoE) to string.
- Parameters:
subblock_config (FFNConfig | AttentionConfig | dict[str, Any] | None) – FFNConfig, AttentionConfig dataclass or dict.
subblock_name (None | str) – Name of subblock (‘ffn’, ‘attention’, ‘mamba’, ‘moe’). Auto-detected if subblock_config is a dataclass.
- Returns:
Formatted string showing subblock type and key parameters (e.g., intermediate_size, num_key_value_heads), or None if input is None.
- Return type:
str | None