calibration

Calibration framework for sparse attention methods.

Classes

`DynamicThresholdCalibrator`	Dynamic threshold calibrator using Exponential model.
`RulerDatasetBuilder`	Builder for RULER calibration datasets.

Functions

calibrate_sparse_attention

Calibrate sparse attention parameters for optimal sparsity.

class DynamicThresholdCalibrator

Bases: object

Dynamic threshold calibrator using Exponential model.

Calibration Algorithm:

For each threshold λ_j in threshold_trials: - Run ALL samples through forward_loop - For each sample i with length L_i, collect sparsity S_ij - Compute scale_factor_ij = λ_j × L_i
Fit Exponential model to ALL individual (sf_ij, S_ij) pairs: scale_factor = a * exp(b * sparsity)
Return fitted a and b parameters

At inference time (user specifies target_sparsity S*):

scale_factor = a * exp(b * S*) threshold = scale_factor / seqlen

Key insight: Using all individual data points (N_thresholds × N_samples) instead of per-threshold averages provides more accurate fitting without additional calibration time cost.

__init__(threshold_trials=None)

Initialize dynamic threshold calibrator.

Parameters:: threshold_trials (list[float] | None) – List of thresholds to try during calibration. Should span a range that achieves sparsities from ~10% to ~95%.

calibrate(model, forward_loop, phase)

Calibrate a and b parameters for Exponential model.

Algorithm:

Set thresholds = threshold_trials on all modules, run ONE forward pass. Each module returns a sparsity list (one entry per threshold) per sample. Unpack to get (scale_factor_ij = λ_j × L_i, sparsity_ij) pairs.
Fit Exponential model to ALL (sf_ij, S_ij) pairs: scale_factor = a * exp(b * sparsity)
Return fitted a and b parameters

At inference time (user specifies target_sparsity S*):

scale_factor = a * exp(b * S*) threshold = scale_factor / seqlen

Parameters:

model (Module) – The model with sparse attention modules
forward_loop (Callable) – Callable that takes model and forwards calibration data
phase (str) – Phase to calibrate (‘prefill’ or ‘decode’)

Returns:

Dict with calibration results including a, b, r_squared, and num_data_points

Return type:

dict[str, Any]

class RulerDatasetBuilder

Bases: object

Builder for RULER calibration datasets.

__init__(samples, max_seqlen, tokenizer_name_or_path, num_length_bins=4, max_length_filter=65536, seed=42, cache_dir=None, data_dir=None)

Initialize RULER dataset builder.

Parameters:

samples (int) – Total number of samples to generate (distributed evenly across length bins)
max_seqlen (int) – Maximum sequence length (length bins auto-generated as powers of 2)
tokenizer_name_or_path (str | object) – HuggingFace tokenizer path or tokenizer object
seed (int) – Random seed for reproducibility
num_length_bins (int) – Number of length bins to generate (default: 4)
max_length_filter (int) – Maximum sequence length to keep (default: 65536)
cache_dir (str | None) – Optional cache directory. If None, uses ~/.cache/modelopt/data/
data_dir (str | Path | None) – Optional path to RULER data directory (contains ‘essays’ subdir). Required for NIAH tasks with essay haystack when not using pip default layout.

Note

Length bins are auto-generated as descending powers of 2: [max_seqlen, max_seqlen/2, max_seqlen/4, …] Generation stops when num_length_bins is reached or length < 1024. Subtasks are set to all the difficult tasks defined in RULER_TASKS.

build_calibration_dataset()

Build the complete calibration dataset.

If cache_dir was set, checks cache first and returns cached data if present. Otherwise generates the dataset, saves to cache (if cache_dir set), and returns.

Returns:: List of calibration samples with ‘input’ and ‘length’ fields
Return type:: list[dict[str, Any]]

calibrate_sparse_attention(model, config, forward_loop=None)

Calibrate sparse attention parameters for optimal sparsity.

Supports both prefill and decode phase calibration with per-phase target sparsity.

Parameters:

model (Module) – Model with sparse attention modules
config (dict[str, Any]) – Sparse attention configuration dict
forward_loop (Callable | None) – Callable that forwards calibration data through model. If None, auto-generates RULER dataset. Only used for prefill.

Returns:

Dictionary with calibration results for each phase

Return type:

dict[str, Any]