mode
Mode registration for KV cache sparsity.
Classes
Mode descriptor for TriAttention KV cache sparsity. |
- class TriAttentionModeDescriptor
Bases:
ModeDescriptorMode descriptor for TriAttention KV cache sparsity.
TriAttention is a calibration-only mode: convert is a no-op on model weights, calibration computes per-head frequency statistics, and the results are stored in metadata for export to serving engines.
- property config_class: type[ModeloptBaseConfig]
Return the configuration class.
- property convert: Callable[[Module, ModeloptBaseConfig], tuple[Module, dict[str, Any]]] | Callable[[Module, ModeloptBaseConfig, Any], tuple[Module, dict[str, Any]]]
Return the convert entrypoint.
- property name: str
Return the mode name.
- property restore: Callable[[Module, ModeloptBaseConfig, dict[str, Any]], Module]
Return the restore entrypoint.
- property update_for_save: Callable[[Module, ModeloptBaseConfig, dict[str, Any]], None]
Return the update-for-save entrypoint.