kernels

Kernel integrations for sparse attention: Triton FA and diffusers backends.

Functions

get_skip_softmax_context

Return True if skip-softmax eager attention is active in this thread.

register_diffusers_eager_attention

Register modelopt_skip_softmax backend in diffusers.

register_diffusers_triton_attention

Register modelopt_triton backend in diffusers.

set_skip_softmax_context

Set thread-local flag indicating skip-softmax eager attention is active.

get_skip_softmax_context()

Return True if skip-softmax eager attention is active in this thread.

Return type:

bool

register_diffusers_eager_attention()

Register modelopt_skip_softmax backend in diffusers.

Safe to call multiple times; registration happens only once.

Return type:

None

register_diffusers_triton_attention()

Register modelopt_triton backend in diffusers.

Safe to call multiple times; registration happens only once.

Return type:

None

set_skip_softmax_context(active)

Set thread-local flag indicating skip-softmax eager attention is active.

Parameters:

active (bool)

Return type:

None