mode

Mode registration for KV cache sparsity.

Classes

TriAttentionModeDescriptor

Mode descriptor for TriAttention KV cache sparsity.

class TriAttentionModeDescriptor

Bases: ModeDescriptor

Mode descriptor for TriAttention KV cache sparsity.

TriAttention is a calibration-only mode: convert is a no-op on model weights, calibration computes per-head frequency statistics, and the results are stored in metadata for export to serving engines.

property config_class: type[ModeloptBaseConfig]

Return the configuration class.

property convert: Callable[[Module, ModeloptBaseConfig], tuple[Module, dict[str, Any]]] | Callable[[Module, ModeloptBaseConfig, Any], tuple[Module, dict[str, Any]]]

Return the convert entrypoint.

property name: str

Return the mode name.

property restore: Callable[[Module, ModeloptBaseConfig, dict[str, Any]], Module]

Return the restore entrypoint.

property update_for_save: Callable[[Module, ModeloptBaseConfig, dict[str, Any]], None]

Return the update-for-save entrypoint.