config
Configuration classes for sparse attention optimization.
Classes
Configuration for automatic threshold calibration using RULER dataset. |
|
Configuration for Flash Attention-aware softmax skip sparse attention. |
|
Sparse attention attribute configuration for pattern-based module config. |
|
Base configuration for sparse attention optimization. |
|
Video Sparse Attention (VSA) attribute configuration. |
|
Configuration for Video Sparse Attention optimization. |
- class CalibrationConfig
Bases:
ModeloptBaseConfigConfiguration for automatic threshold calibration using RULER dataset.
Calibration fits an Exponential model to determine dynamic thresholds that achieve target sparsity. The model learns parameters a and b per phase:
scale_factor = a * exp(b * target_sparsity)
At inference time, the threshold is computed as:
threshold = scale_factor / sequence_length
Key benefits: - Target sparsity can be changed at runtime without recalibration - Threshold automatically adapts to sequence length - Supports independent prefill and decode phase calibration - Exponential model provides better fit (lower RMSE)
- cache_dir: str | None
- chunk_size: int
- data_dir: str | None
- fit_logspace: bool
- max_seqlen: int
- model_config = {'extra': 'forbid', 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- num_decode_tokens: int
- num_length_bins: int
- samples: int
- target_sparse_ratio: dict[str, float]
- threshold_trials: list[float] | None
- classmethod validate_chunk_size(v)
Validate chunk_size is positive or -1 (disabled).
- classmethod validate_max_seqlen(v)
Validate max_seqlen is at least 1024.
- classmethod validate_num_decode_tokens(v)
Validate num_decode_tokens is positive.
- classmethod validate_num_length_bins(v)
Validate num_length_bins is positive.
- classmethod validate_samples(v)
Validate samples is positive.
- classmethod validate_target_sparse_ratio(v)
Validate target sparsity ratio dict.
- classmethod validate_threshold_trials(v)
Validate threshold_trials are in valid range.
- class FlashSkipSoftmaxConfig
Bases:
SparseAttentionConfigConfiguration for Flash Attention-aware softmax skip sparse attention.
- model_config = {'extra': 'forbid', 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- sparse_cfg: dict[str | Callable, dict[str, Any]]
- class SparseAttentionAttributeConfig
Bases:
ModeloptBaseConfigSparse attention attribute configuration for pattern-based module config.
- backend: str
- bc: int
- br: int
- collect_stats: bool
- dense_recent_tokens: int
- dense_sink_tokens: int
- enable: bool
- export_sparse_softmax: bool
- initial_disabled_steps: int
- is_causal: bool
- method: str
- model_config = {'extra': 'forbid', 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- skip_softmax_threshold: float
- sparsity_m: int
- sparsity_n: int
- thresholds: dict[str, list[float]]
- classmethod validate_backend(v)
Validate backend is pytorch or triton.
- classmethod validate_block_size(v)
Validate block sizes are positive integers.
- classmethod validate_dense_recent_tokens(v)
Validate dense_recent_tokens is non-negative.
- classmethod validate_dense_sink_tokens(v)
Validate dense_sink_tokens is non-negative.
- classmethod validate_method(v)
Validate method is a string.
- classmethod validate_sparsity_m(v)
Validate sparsity_m is 4 or 8.
- classmethod validate_sparsity_n(v)
Validate sparsity_n is non-negative.
- validate_sparsity_n_vs_m()
Validate sparsity_n is within the supported range for the given sparsity_m.
- classmethod validate_thresholds(v)
Validate thresholds is a dict of lists with valid phases and values in range (0, 1).
- class SparseAttentionConfig
Bases:
ModeloptBaseConfigBase configuration for sparse attention optimization.
This base configuration provides the common structure for all sparse attention methods and supports pattern-based layer configuration.
- export_format: str | None
- model_config = {'extra': 'forbid', 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- sparse_cfg: dict[str | Callable, dict[str, Any]]
- class VSAAttributeConfig
Bases:
ModeloptBaseConfigVideo Sparse Attention (VSA) attribute configuration.
VSA uses a two-branch architecture optimized for video diffusion models: 1. Compression branch: Block-averaged coarse attention 2. Sparse branch: Top-K block selection for fine-grained attention
- block_size_3d: tuple[int, int, int] | list[int]
- collect_stats: bool
- enable: bool
- method: str
- model_config = {'extra': 'forbid', 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- top_k_ratio: float
- classmethod validate_block_size_3d(v)
Validate 3D block size.
- classmethod validate_top_k_ratio(v)
Validate top-K ratio is in valid range.
- classmethod validate_video_shape(v)
Validate video shape if provided.
- classmethod validate_vsa_method(v)
Validate method is ‘vsa’.
- video_shape: tuple[int, int, int] | list[int] | None
- class VSAConfig
Bases:
SparseAttentionConfigConfiguration for Video Sparse Attention optimization.
- model_config = {'extra': 'forbid', 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- sparse_cfg: dict[str | Callable, dict[str, Any]]