kernels
Kernel integrations for sparse attention: Triton FA and diffusers backends.
Functions
Return True if skip-softmax eager attention is active in this thread. |
|
Register |
|
Register |
|
Set thread-local flag indicating skip-softmax eager attention is active. |
- get_skip_softmax_context()
Return True if skip-softmax eager attention is active in this thread.
- Return type:
bool
- register_diffusers_eager_attention()
Register
modelopt_skip_softmaxbackend in diffusers.Safe to call multiple times; registration happens only once.
- Return type:
None
- register_diffusers_triton_attention()
Register
modelopt_tritonbackend in diffusers.Safe to call multiple times; registration happens only once.
- Return type:
None
- set_skip_softmax_context(active)
Set thread-local flag indicating skip-softmax eager attention is active.
- Parameters:
active (bool)
- Return type:
None