kv_cache

Modules

modelopt.torch.sparsity.kv_cache.config

Configuration for KV cache sparsity modes.

modelopt.torch.sparsity.kv_cache.conversion

Convert/restore/update entrypoints for TriAttention mode.

modelopt.torch.sparsity.kv_cache.mode

Mode registration for KV cache sparsity.

modelopt.torch.sparsity.kv_cache.model_sparsify

Entry points for KV cache sparsity: sparsify() and calibrate().

modelopt.torch.sparsity.kv_cache.triattention

TriAttention: Trigonometric KV cache compression.

KV cache sparsity algorithms for LLM inference optimization.