calibration
Calibration framework for sparse attention methods.
Classes
Dynamic threshold calibrator using Exponential model. |
|
Builder for RULER calibration datasets. |
Functions
Calibrate sparse attention parameters for optimal sparsity. |
- class DynamicThresholdCalibrator
Bases:
objectDynamic threshold calibrator using Exponential model.
- Calibration Algorithm:
For each threshold λ_j in threshold_trials: - Run ALL samples through forward_loop - For each sample i with length L_i, collect sparsity S_ij - Compute scale_factor_ij = λ_j × L_i
Fit Exponential model to ALL individual (sf_ij, S_ij) pairs: scale_factor = a * exp(b * sparsity)
Return fitted a and b parameters
- At inference time (user specifies target_sparsity S*):
scale_factor = a * exp(b * S*) threshold = scale_factor / seqlen
Key insight: Using all individual data points (N_thresholds × N_samples) instead of per-threshold averages provides more accurate fitting without additional calibration time cost.
- __init__(threshold_trials=None)
Initialize dynamic threshold calibrator.
- Parameters:
threshold_trials (list[float] | None) – List of thresholds to try during calibration. Should span a range that achieves sparsities from ~10% to ~95%.
- calibrate(model, forward_loop, phase)
Calibrate a and b parameters for Exponential model.
- Algorithm:
For each threshold λ_j in threshold_trials: - Run ALL samples, collect sparsities S_ij for each sample i - Compute scale_factor_ij = λ_j × L_i (where L_i is sample length)
Fit Exponential model to ALL (sf_ij, S_ij) pairs: scale_factor = a * exp(b * sparsity)
Return fitted a and b parameters
- At inference time (user specifies target_sparsity S*):
scale_factor = a * exp(b * S*) threshold = scale_factor / seqlen
- Parameters:
model (Module) – The model with sparse attention modules
forward_loop (Callable) – Callable that takes model and forwards calibration data
phase (str) – Phase to calibrate (‘prefill’ or ‘decode’)
- Returns:
Dict with calibration results including a, b, r_squared, and num_data_points
- Return type:
dict[str, Any]
- class RulerDatasetBuilder
Bases:
objectBuilder for RULER calibration datasets.
- __init__(samples, max_seqlen, tokenizer_name_or_path, num_length_bins=4, max_length_filter=65536, seed=42, cache_dir=None, data_dir=None)
Initialize RULER dataset builder.
- Parameters:
samples (int) – Total number of samples to generate (distributed evenly across length bins)
max_seqlen (int) – Maximum sequence length (length bins auto-generated as powers of 2)
tokenizer_name_or_path (str | object) – HuggingFace tokenizer path or tokenizer object
seed (int) – Random seed for reproducibility
num_length_bins (int) – Number of length bins to generate (default: 4)
max_length_filter (int) – Maximum sequence length to keep (default: 65536)
cache_dir (str | None) – Optional cache directory. If None, uses ~/.cache/modelopt/data/
data_dir (str | Path | None) – Optional path to RULER data directory (contains ‘essays’ subdir). Required for NIAH tasks with essay haystack when not using pip default layout.
Note
Length bins are auto-generated as descending powers of 2: [max_seqlen, max_seqlen/2, max_seqlen/4, …] Generation stops when num_length_bins is reached or length < 1024. Subtasks are set to all the difficult tasks defined in RULER_TASKS.
- build_calibration_dataset()
Build the complete calibration dataset.
If cache_dir was set, checks cache first and returns cached data if present. Otherwise generates the dataset, saves to cache (if cache_dir set), and returns.
- Returns:
List of calibration samples with ‘input’ and ‘length’ fields
- Return type:
list[dict[str, Any]]
- calibrate_sparse_attention(model, config, forward_loop=None)
Calibrate sparse attention parameters for optimal sparsity.
Supports both prefill and decode phase calibration with per-phase target sparsity.
- Parameters:
model (Module) – Model with sparse attention modules
config (dict[str, Any]) – Sparse attention configuration dict
forward_loop (Callable | None) – Callable that forwards calibration data through model. If None, auto-generates RULER dataset. Only used for prefill.
- Returns:
Dictionary with calibration results for each phase
- Return type:
dict[str, Any]