block_config

Classes

BaseDataclass

A dataclass base class with several utilities: 1.

SubblockConfig

Base configuration for a subblock (e.g. attention or FFN) within a transformer block.

MoEConfig

Configuration class for Mixture of Experts parameters.

MambaConfig

Configuration for a Mamba (state-space model) subblock.

Llama4AttentionConfig

Configuration for Llama-4-specific attention parameters.

AttentionConfig

Configuration for an attention subblock within a transformer block.

FFNConfig

Configuration for a feed-forward network subblock within a transformer block.

BlockConfig

Configuration for a single transformer block, including its attention and FFN subblocks.

Functions

maybe_cast_block_configs

Cast a list of dicts to BlockConfig objects if needed.

class AttentionConfig

Bases: SubblockConfig

Configuration for an attention subblock within a transformer block.

__init__(*, no_op=False, replace_with_linear=False, sparsify=None, weights_precision='bf16', num_key_value_heads=None, llama4=None, mamba=None)
Parameters:
  • no_op (bool)

  • replace_with_linear (bool)

  • sparsify (list[str] | None)

  • weights_precision (str | None)

  • num_key_value_heads (int | None)

  • llama4 (Llama4AttentionConfig | None)

  • mamba (MambaConfig | None)

Return type:

None

property is_llama4: bool
property is_mamba: bool
llama4: Llama4AttentionConfig | None = None
mamba: MambaConfig | None = None
num_key_value_heads: int | None = None
to_blockconfig()
Return type:

BlockConfig

class BaseDataclass

Bases: object

A dataclass base class with several utilities: 1. Comparison via string representation. 2. Initialization of dataclasses fields from dicts. 3. Setting attributes even though it’s frozen (but only inside __post_init__!)

__init__()
Return type:

None

class BlockConfig

Bases: BaseDataclass

Configuration for a single transformer block, including its attention and FFN subblocks.

__init__(*, attention=None, ffn=None, parallel_blocks=None)
Parameters:
Return type:

None

attention: AttentionConfig | None = None
ffn: FFNConfig | None = None
parallel_blocks: list[BlockConfig] | None = None
to_dict()

Convert BlockConfig to a dictionary.

Return type:

dict

class FFNConfig

Bases: SubblockConfig

Configuration for a feed-forward network subblock within a transformer block.

__init__(*, no_op=False, replace_with_linear=False, sparsify=None, weights_precision='bf16', moe=None, intermediate_size=None)
Parameters:
  • no_op (bool)

  • replace_with_linear (bool)

  • sparsify (list[str] | None)

  • weights_precision (str | None)

  • moe (MoEConfig | None)

  • intermediate_size (int | None)

Return type:

None

intermediate_size: int | None = None
property is_moe: bool
moe: MoEConfig | None = None
to_blockconfig()
Return type:

BlockConfig

class Llama4AttentionConfig

Bases: BaseDataclass

Configuration for Llama-4-specific attention parameters.

__init__(*, attention_chunk_size=None, use_rope=None, use_qk_norm=None, attn_scale=None, floor_scale=None, attn_temperature_tuning=None, attention_dropout=None)
Parameters:
  • attention_chunk_size (int | None)

  • use_rope (bool | None)

  • use_qk_norm (bool | None)

  • attn_scale (float | None)

  • floor_scale (float | None)

  • attn_temperature_tuning (bool | None)

  • attention_dropout (float | None)

Return type:

None

attention_chunk_size: int | None = None
attention_dropout: float | None = None
attn_scale: float | None = None
attn_temperature_tuning: bool | None = None
floor_scale: float | None = None
use_qk_norm: bool | None = None
use_rope: bool | None = None
class MambaConfig

Bases: BaseDataclass

Configuration for a Mamba (state-space model) subblock.

__init__(*, state_dim, num_heads, head_dim, num_groups)
Parameters:
  • state_dim (int)

  • num_heads (int)

  • head_dim (int)

  • num_groups (int)

Return type:

None

head_dim: int
num_groups: int
num_heads: int
state_dim: int
class MoEConfig

Bases: BaseDataclass

Configuration class for Mixture of Experts parameters.

__init__(*, num_local_experts=8, num_experts_per_tok=1, expert_intermediate_dim=8192, shared_expert_intermediate_dim=8192)
Parameters:
  • num_local_experts (int)

  • num_experts_per_tok (int)

  • expert_intermediate_dim (int)

  • shared_expert_intermediate_dim (int)

Return type:

None

expert_intermediate_dim: int = 8192
num_experts_per_tok: int = 1
num_local_experts: int = 8
shared_expert_intermediate_dim: int = 8192
class SubblockConfig

Bases: BaseDataclass

Base configuration for a subblock (e.g. attention or FFN) within a transformer block.

__init__(*, no_op=False, replace_with_linear=False, sparsify=None, weights_precision='bf16')
Parameters:
  • no_op (bool)

  • replace_with_linear (bool)

  • sparsify (list[str] | None)

  • weights_precision (str | None)

Return type:

None

no_op: bool = False
replace_with_linear: bool = False
sparsify: list[str] | None = None
abstract to_blockconfig()

” Convert to a block including this subblock only.

Return type:

BlockConfig

weights_precision: str | None = 'bf16'
maybe_cast_block_configs(block_configs)

Cast a list of dicts to BlockConfig objects if needed.

Parameters:

block_configs (List[BlockConfig | dict] | None) – List of BlockConfig or dict objects, or None.

Returns:

List of BlockConfig objects, or None if input is None/empty.

Return type:

List[BlockConfig] | None