kernels

Kernel integrations for sparse attention: Triton FA and diffusers backends.

Functions

`get_skip_softmax_context`	Return True if skip-softmax eager attention is active in this thread.
`register_diffusers_eager_attention`	Register `modelopt_skip_softmax` backend in diffusers.
`register_diffusers_triton_attention`	Register `modelopt_triton` backend in diffusers.
`set_skip_softmax_context`	Set thread-local flag indicating skip-softmax eager attention is active.

get_skip_softmax_context()

Return True if skip-softmax eager attention is active in this thread.

register_diffusers_eager_attention()

Register modelopt_skip_softmax backend in diffusers.

Safe to call multiple times; registration happens only once.

register_diffusers_triton_attention()

Register modelopt_triton backend in diffusers.

Safe to call multiple times; registration happens only once.

set_skip_softmax_context(active)

Set thread-local flag indicating skip-softmax eager attention is active.