calibration

Calibration framework for sparse attention methods.

Classes

DynamicThresholdCalibrator

Dynamic threshold calibrator using Exponential model.

RulerDatasetBuilder

Builder for RULER calibration datasets.

Functions

calibrate_sparse_attention

Calibrate sparse attention parameters for optimal sparsity.

class DynamicThresholdCalibrator

Bases: object

Dynamic threshold calibrator using Exponential model.

Calibration Algorithm:
  1. For each threshold λ_j in threshold_trials: - Run ALL samples through forward_loop - For each sample i with length L_i, collect sparsity S_ij - Compute scale_factor_ij = λ_j × L_i

  2. Fit Exponential model to ALL individual (sf_ij, S_ij) pairs: scale_factor = a * exp(b * sparsity)

  3. Return fitted a and b parameters

At inference time (user specifies target_sparsity S*):

scale_factor = a * exp(b * S*) threshold = scale_factor / seqlen

Key insight: Using all individual data points (N_thresholds × N_samples) instead of per-threshold averages provides more accurate fitting without additional calibration time cost.

__init__(threshold_trials=None)

Initialize dynamic threshold calibrator.

Parameters:

threshold_trials (list[float] | None) – List of thresholds to try during calibration. Should span a range that achieves sparsities from ~10% to ~95%.

calibrate(model, forward_loop, phase)

Calibrate a and b parameters for Exponential model.

Algorithm:
  1. For each threshold λ_j in threshold_trials: - Run ALL samples, collect sparsities S_ij for each sample i - Compute scale_factor_ij = λ_j × L_i (where L_i is sample length)

  2. Fit Exponential model to ALL (sf_ij, S_ij) pairs: scale_factor = a * exp(b * sparsity)

  3. Return fitted a and b parameters

At inference time (user specifies target_sparsity S*):

scale_factor = a * exp(b * S*) threshold = scale_factor / seqlen

Parameters:
  • model (Module) – The model with sparse attention modules

  • forward_loop (Callable) – Callable that takes model and forwards calibration data

  • phase (str) – Phase to calibrate (‘prefill’ or ‘decode’)

Returns:

Dict with calibration results including a, b, r_squared, and num_data_points

Return type:

dict[str, Any]

class RulerDatasetBuilder

Bases: object

Builder for RULER calibration datasets.

__init__(samples, max_seqlen, tokenizer_name_or_path, num_length_bins=4, max_length_filter=65536, seed=42, cache_dir=None, data_dir=None)

Initialize RULER dataset builder.

Parameters:
  • samples (int) – Total number of samples to generate (distributed evenly across length bins)

  • max_seqlen (int) – Maximum sequence length (length bins auto-generated as powers of 2)

  • tokenizer_name_or_path (str | object) – HuggingFace tokenizer path or tokenizer object

  • seed (int) – Random seed for reproducibility

  • num_length_bins (int) – Number of length bins to generate (default: 4)

  • max_length_filter (int) – Maximum sequence length to keep (default: 65536)

  • cache_dir (str | None) – Optional cache directory. If None, uses ~/.cache/modelopt/data/

  • data_dir (str | Path | None) – Optional path to RULER data directory (contains ‘essays’ subdir). Required for NIAH tasks with essay haystack when not using pip default layout.

Note

Length bins are auto-generated as descending powers of 2: [max_seqlen, max_seqlen/2, max_seqlen/4, …] Generation stops when num_length_bins is reached or length < 1024. Subtasks are set to all the difficult tasks defined in RULER_TASKS.

build_calibration_dataset()

Build the complete calibration dataset.

If cache_dir was set, checks cache first and returns cached data if present. Otherwise generates the dataset, saves to cache (if cache_dir set), and returns.

Returns:

List of calibration samples with ‘input’ and ‘length’ fields

Return type:

list[dict[str, Any]]

calibrate_sparse_attention(model, config, forward_loop=None)

Calibrate sparse attention parameters for optimal sparsity.

Supports both prefill and decode phase calibration with per-phase target sparsity.

Parameters:
  • model (Module) – Model with sparse attention modules

  • config (dict[str, Any]) – Sparse attention configuration dict

  • forward_loop (Callable | None) – Callable that forwards calibration data through model. If None, auto-generates RULER dataset. Only used for prefill.

Returns:

Dictionary with calibration results for each phase

Return type:

dict[str, Any]