block_config
Classes
A dataclass base class with several utilities: 1. |
|
Base configuration for a subblock (e.g. attention or FFN) within a transformer block. |
|
Configuration class for Mixture of Experts parameters. |
|
Configuration for a Mamba (state-space model) subblock. |
|
Configuration for Llama-4-specific attention parameters. |
|
Configuration for an attention subblock within a transformer block. |
|
Configuration for a feed-forward network subblock within a transformer block. |
|
Configuration for a single transformer block, including its attention and FFN subblocks. |
Functions
Cast a list of dicts to BlockConfig objects if needed. |
- class AttentionConfig
Bases:
SubblockConfigConfiguration for an attention subblock within a transformer block.
- __init__(*, no_op=False, replace_with_linear=False, sparsify=None, weights_precision='bf16', num_key_value_heads=None, llama4=None, mamba=None)
- Parameters:
no_op (bool)
replace_with_linear (bool)
sparsify (list[str] | None)
weights_precision (str | None)
num_key_value_heads (int | None)
llama4 (Llama4AttentionConfig | None)
mamba (MambaConfig | None)
- Return type:
None
- property is_llama4: bool
- property is_mamba: bool
- llama4: Llama4AttentionConfig | None = None
- mamba: MambaConfig | None = None
- num_key_value_heads: int | None = None
- to_blockconfig()
- Return type:
- class BaseDataclass
Bases:
objectA dataclass base class with several utilities: 1. Comparison via string representation. 2. Initialization of dataclasses fields from dicts. 3. Setting attributes even though it’s frozen (but only inside __post_init__!)
- __init__()
- Return type:
None
- class BlockConfig
Bases:
BaseDataclassConfiguration for a single transformer block, including its attention and FFN subblocks.
- __init__(*, attention=None, ffn=None, parallel_blocks=None)
- Parameters:
attention (AttentionConfig | None)
ffn (FFNConfig | None)
parallel_blocks (list[BlockConfig] | None)
- Return type:
None
- attention: AttentionConfig | None = None
- parallel_blocks: list[BlockConfig] | None = None
- to_dict()
Convert BlockConfig to a dictionary.
- Return type:
dict
- class FFNConfig
Bases:
SubblockConfigConfiguration for a feed-forward network subblock within a transformer block.
- __init__(*, no_op=False, replace_with_linear=False, sparsify=None, weights_precision='bf16', moe=None, intermediate_size=None)
- Parameters:
no_op (bool)
replace_with_linear (bool)
sparsify (list[str] | None)
weights_precision (str | None)
moe (MoEConfig | None)
intermediate_size (int | None)
- Return type:
None
- intermediate_size: int | None = None
- property is_moe: bool
- to_blockconfig()
- Return type:
- class Llama4AttentionConfig
Bases:
BaseDataclassConfiguration for Llama-4-specific attention parameters.
- __init__(*, attention_chunk_size=None, use_rope=None, use_qk_norm=None, attn_scale=None, floor_scale=None, attn_temperature_tuning=None, attention_dropout=None)
- Parameters:
attention_chunk_size (int | None)
use_rope (bool | None)
use_qk_norm (bool | None)
attn_scale (float | None)
floor_scale (float | None)
attn_temperature_tuning (bool | None)
attention_dropout (float | None)
- Return type:
None
- attention_chunk_size: int | None = None
- attention_dropout: float | None = None
- attn_scale: float | None = None
- attn_temperature_tuning: bool | None = None
- floor_scale: float | None = None
- use_qk_norm: bool | None = None
- use_rope: bool | None = None
- class MambaConfig
Bases:
BaseDataclassConfiguration for a Mamba (state-space model) subblock.
- __init__(*, state_dim, num_heads, head_dim, num_groups)
- Parameters:
state_dim (int)
num_heads (int)
head_dim (int)
num_groups (int)
- Return type:
None
- head_dim: int
- num_groups: int
- num_heads: int
- state_dim: int
- class MoEConfig
Bases:
BaseDataclassConfiguration class for Mixture of Experts parameters.
- __init__(*, num_local_experts=8, num_experts_per_tok=1, expert_intermediate_dim=8192, shared_expert_intermediate_dim=8192)
- Parameters:
num_local_experts (int)
num_experts_per_tok (int)
expert_intermediate_dim (int)
shared_expert_intermediate_dim (int)
- Return type:
None
- expert_intermediate_dim: int = 8192
- num_experts_per_tok: int = 1
- num_local_experts: int = 8
- class SubblockConfig
Bases:
BaseDataclassBase configuration for a subblock (e.g. attention or FFN) within a transformer block.
- __init__(*, no_op=False, replace_with_linear=False, sparsify=None, weights_precision='bf16')
- Parameters:
no_op (bool)
replace_with_linear (bool)
sparsify (list[str] | None)
weights_precision (str | None)
- Return type:
None
- no_op: bool = False
- replace_with_linear: bool = False
- sparsify: list[str] | None = None
- abstract to_blockconfig()
” Convert to a block including this subblock only.
- Return type:
- weights_precision: str | None = 'bf16'
- maybe_cast_block_configs(block_configs)
Cast a list of dicts to BlockConfig objects if needed.
- Parameters:
block_configs (List[BlockConfig | dict] | None) – List of BlockConfig or dict objects, or None.
- Returns:
List of BlockConfig objects, or None if input is None/empty.
- Return type:
List[BlockConfig] | None