config

Configuration classes for sparse attention optimization.

Classes

CalibrationConfig

Configuration for automatic threshold calibration using RULER dataset.

FlashSkipSoftmaxConfig

Configuration for Flash Attention-aware softmax skip sparse attention.

SparseAttentionAttributeConfig

Sparse attention attribute configuration for pattern-based module config.

SparseAttentionConfig

Base configuration for sparse attention optimization.

VSAAttributeConfig

Video Sparse Attention (VSA) attribute configuration.

VSAConfig

Configuration for Video Sparse Attention optimization.

class CalibrationConfig

Bases: ModeloptBaseConfig

Configuration for automatic threshold calibration using RULER dataset.

Calibration fits an Exponential model to determine dynamic thresholds that achieve target sparsity. The model learns parameters a and b per phase:

scale_factor = a * exp(b * target_sparsity)

At inference time, the threshold is computed as:

threshold = scale_factor / sequence_length

Key benefits: - Target sparsity can be changed at runtime without recalibration - Threshold automatically adapts to sequence length - Supports independent prefill and decode phase calibration - Exponential model provides better fit (lower RMSE)

cache_dir: str | None
chunk_size: int
data_dir: str | None
fit_logspace: bool
max_seqlen: int
model_config = {'extra': 'forbid', 'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

num_decode_tokens: int
num_length_bins: int
samples: int
target_sparse_ratio: dict[str, float]
threshold_trials: list[float] | None
classmethod validate_chunk_size(v)

Validate chunk_size is positive or -1 (disabled).

classmethod validate_max_seqlen(v)

Validate max_seqlen is at least 1024.

classmethod validate_num_decode_tokens(v)

Validate num_decode_tokens is positive.

classmethod validate_num_length_bins(v)

Validate num_length_bins is positive.

classmethod validate_samples(v)

Validate samples is positive.

classmethod validate_target_sparse_ratio(v)

Validate target sparsity ratio dict.

classmethod validate_threshold_trials(v)

Validate threshold_trials are in valid range.

class FlashSkipSoftmaxConfig

Bases: SparseAttentionConfig

Configuration for Flash Attention-aware softmax skip sparse attention.

model_config = {'extra': 'forbid', 'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

sparse_cfg: dict[str | Callable, dict[str, Any]]
class SparseAttentionAttributeConfig

Bases: ModeloptBaseConfig

Sparse attention attribute configuration for pattern-based module config.

backend: str
bc: int
br: int
collect_stats: bool
dense_recent_tokens: int
dense_sink_tokens: int
enable: bool
export_sparse_softmax: bool
initial_disabled_steps: int
is_causal: bool
method: str
model_config = {'extra': 'forbid', 'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

skip_softmax_threshold: float
sparsity_m: int
sparsity_n: int
thresholds: dict[str, list[float]]
classmethod validate_backend(v)

Validate backend is pytorch or triton.

classmethod validate_block_size(v)

Validate block sizes are positive integers.

classmethod validate_dense_recent_tokens(v)

Validate dense_recent_tokens is non-negative.

classmethod validate_dense_sink_tokens(v)

Validate dense_sink_tokens is non-negative.

classmethod validate_method(v)

Validate method is a string.

classmethod validate_sparsity_m(v)

Validate sparsity_m is 4 or 8.

classmethod validate_sparsity_n(v)

Validate sparsity_n is non-negative.

validate_sparsity_n_vs_m()

Validate sparsity_n is within the supported range for the given sparsity_m.

classmethod validate_thresholds(v)

Validate thresholds is a dict of lists with valid phases and values in range (0, 1).

class SparseAttentionConfig

Bases: ModeloptBaseConfig

Base configuration for sparse attention optimization.

This base configuration provides the common structure for all sparse attention methods and supports pattern-based layer configuration.

export_format: str | None
model_config = {'extra': 'forbid', 'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

sparse_cfg: dict[str | Callable, dict[str, Any]]
class VSAAttributeConfig

Bases: ModeloptBaseConfig

Video Sparse Attention (VSA) attribute configuration.

VSA uses a two-branch architecture optimized for video diffusion models: 1. Compression branch: Block-averaged coarse attention 2. Sparse branch: Top-K block selection for fine-grained attention

block_size_3d: tuple[int, int, int] | list[int]
collect_stats: bool
enable: bool
method: str
model_config = {'extra': 'forbid', 'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

top_k_ratio: float
classmethod validate_block_size_3d(v)

Validate 3D block size.

classmethod validate_top_k_ratio(v)

Validate top-K ratio is in valid range.

classmethod validate_video_shape(v)

Validate video shape if provided.

classmethod validate_vsa_method(v)

Validate method is ‘vsa’.

video_shape: tuple[int, int, int] | list[int] | None
class VSAConfig

Bases: SparseAttentionConfig

Configuration for Video Sparse Attention optimization.

model_config = {'extra': 'forbid', 'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

sparse_cfg: dict[str | Callable, dict[str, Any]]