kv_cache
Modules
Configuration for KV cache sparsity modes. |
|
Convert/restore/update entrypoints for TriAttention mode. |
|
Mode registration for KV cache sparsity. |
|
Entry points for KV cache sparsity: sparsify() and calibrate(). |
|
TriAttention: Trigonometric KV cache compression. |
KV cache sparsity algorithms for LLM inference optimization.