diffusers_utils

Code that export quantized Hugging Face models for deployment.

Functions

generate_diffusion_dummy_forward_fn

Create a dummy forward function for diffusion(-like) models.

generate_diffusion_dummy_inputs

Generate dummy inputs for diffusion model forward pass.

get_diffusers_components

Get all exportable components from a diffusion(-like) pipeline.

get_diffusion_components

Get all exportable components from a diffusion(-like) pipeline.

get_qkv_group_key

Extract the parent attention block path and QKV type for grouping.

hide_quantizers_from_state_dict

Context manager that temporarily removes quantizer modules from the model.

infer_dtype_from_model

Infer the dtype from a model's parameters.

is_diffusers_object

Return True if model is a diffusers pipeline/component or LTX-2 pipeline.

is_qkv_projection

Check if a module name corresponds to a QKV projection layer.

generate_diffusion_dummy_forward_fn(model)

Create a dummy forward function for diffusion(-like) models.

  • For diffusers components, this uses generate_diffusion_dummy_inputs() and calls model(**kwargs).

  • For LTX-2 stage-1 transformer (X0Model), the forward signature is model(video: Modality|None, audio: Modality|None, perturbations: BatchedPerturbationConfig), so we build tiny ltx_core dataclasses and call the model directly.

Parameters:

model (Module)

Return type:

Callable[[], None]

generate_diffusion_dummy_inputs(model, device, dtype)

Generate dummy inputs for diffusion model forward pass.

Different diffusion models have very different input formats: - DiTTransformer2DModel: 4D hidden_states + class_labels - FluxTransformer2DModel: 3D hidden_states + encoder_hidden_states + img_ids + txt_ids + pooled_projections - SD3Transformer2DModel: 4D hidden_states + encoder_hidden_states + pooled_projections - UNet2DConditionModel: 4D sample + timestep + encoder_hidden_states - WanTransformer3DModel: 5D hidden_states + encoder_hidden_states + timestep

Parameters:
  • model (Module) – The diffusion model component.

  • device (device) – Device to create tensors on.

  • dtype (dtype) – Data type for tensors.

Returns:

Dictionary of dummy inputs, or None if model type is not supported.

Return type:

dict[str, Tensor] | None

get_diffusers_components(model, components=None)

Get all exportable components from a diffusion(-like) pipeline.

Supports: - diffusers DiffusionPipeline: returns pipeline.components - diffusers component nn.Module (e.g., UNet / transformer) - LTX-2 pipeline (duck-typed): returns stage-1 transformer only as stage_1_transformer

Parameters:
  • model (Any) – The pipeline or component.

  • components (list[str] | None) – Optional list of component names to filter. If None, all components are returned.

Returns:

Dictionary mapping component names to their instances (can be nn.Module, tokenizers, schedulers, etc.).

Return type:

dict[str, Any]

get_diffusion_components(model, components=None)

Get all exportable components from a diffusion(-like) pipeline.

Supports: - diffusers DiffusionPipeline: returns pipeline.components - diffusers component nn.Module (e.g., UNet / transformer) - LTX-2 pipeline (duck-typed): returns stage-1 transformer only as stage_1_transformer

Parameters:
  • model (Any) – The pipeline or component.

  • components (list[str] | None) – Optional list of component names to filter. If None, all components are returned.

Returns:

Dictionary mapping component names to their instances (can be nn.Module, tokenizers, schedulers, etc.).

Return type:

dict[str, Any]

get_qkv_group_key(module_name)

Extract the parent attention block path and QKV type for grouping.

QKV projections should only be fused within the same attention block AND for the same type of attention (main vs added/cross).

Examples

  • ‘transformer_blocks.0.attn.to_q’ -> ‘transformer_blocks.0.attn.main’

  • ‘transformer_blocks.0.attn.to_k’ -> ‘transformer_blocks.0.attn.main’

  • ‘transformer_blocks.5.attn.add_q_proj’ -> ‘transformer_blocks.5.attn.add’

  • ‘transformer_blocks.5.attn.add_k_proj’ -> ‘transformer_blocks.5.attn.add’

Parameters:

module_name (str) – The full module name path.

Returns:

A string key representing the attention block and QKV type for grouping.

Return type:

str

hide_quantizers_from_state_dict(model)

Context manager that temporarily removes quantizer modules from the model.

This allows save_pretrained to save the model without quantizer buffers like _amax. The quantizers are restored after exiting the context.

Parameters:

model (Module) – The model with quantizers to temporarily hide.

Yields:

None - the model can be saved within the context.

infer_dtype_from_model(model)

Infer the dtype from a model’s parameters.

Parameters:

model (Module) – The model to infer dtype from.

Returns:

The dtype of the model’s parameters, defaulting to float16 if no parameters found.

Return type:

dtype

is_diffusers_object(model)

Return True if model is a diffusers pipeline/component or LTX-2 pipeline.

Parameters:

model (Any)

Return type:

bool

is_qkv_projection(module_name)

Check if a module name corresponds to a QKV projection layer.

In diffusers, QKV projections typically have names like: - to_q, to_k, to_v (most common in diffusers attention) - q_proj, k_proj, v_proj - query, key, value - add_q_proj, add_k_proj, add_v_proj (for additional attention in some models)

We exclude: - norm*.linear (AdaLayerNorm modulation layers) - proj_out, proj_mlp (output projections) - ff.*, mlp.* (feed-forward layers) - to_out (output projection)

Parameters:

module_name (str) – The full module name path.

Returns:

True if this is a QKV projection layer.

Return type:

bool