diffusers_utils
Code that export quantized Hugging Face models for deployment.
Functions
Create a dummy forward function for diffusion(-like) models. |
|
Generate dummy inputs for diffusion model forward pass. |
|
Get all exportable components from a diffusion(-like) pipeline. |
|
Get all exportable components from a diffusion(-like) pipeline. |
|
Extract the parent attention block path and QKV type for grouping. |
|
Context manager that temporarily removes quantizer modules from the model. |
|
Infer the dtype from a model's parameters. |
|
Return True if model is a diffusers pipeline/component or LTX-2 pipeline. |
|
Check if a module name corresponds to a QKV projection layer. |
- generate_diffusion_dummy_forward_fn(model)
Create a dummy forward function for diffusion(-like) models.
For diffusers components, this uses generate_diffusion_dummy_inputs() and calls model(**kwargs).
For LTX-2 stage-1 transformer (X0Model), the forward signature is model(video: Modality|None, audio: Modality|None, perturbations: BatchedPerturbationConfig), so we build tiny ltx_core dataclasses and call the model directly.
- Parameters:
model (Module)
- Return type:
Callable[[], None]
- generate_diffusion_dummy_inputs(model, device, dtype)
Generate dummy inputs for diffusion model forward pass.
Different diffusion models have very different input formats: - DiTTransformer2DModel: 4D hidden_states + class_labels - FluxTransformer2DModel: 3D hidden_states + encoder_hidden_states + img_ids + txt_ids + pooled_projections - SD3Transformer2DModel: 4D hidden_states + encoder_hidden_states + pooled_projections - UNet2DConditionModel: 4D sample + timestep + encoder_hidden_states - WanTransformer3DModel: 5D hidden_states + encoder_hidden_states + timestep
- Parameters:
model (Module) – The diffusion model component.
device (device) – Device to create tensors on.
dtype (dtype) – Data type for tensors.
- Returns:
Dictionary of dummy inputs, or None if model type is not supported.
- Return type:
dict[str, Tensor] | None
- get_diffusers_components(model, components=None)
Get all exportable components from a diffusion(-like) pipeline.
Supports: - diffusers DiffusionPipeline: returns pipeline.components - diffusers component nn.Module (e.g., UNet / transformer) - LTX-2 pipeline (duck-typed): returns stage-1 transformer only as stage_1_transformer
- Parameters:
model (Any) – The pipeline or component.
components (list[str] | None) – Optional list of component names to filter. If None, all components are returned.
- Returns:
Dictionary mapping component names to their instances (can be nn.Module, tokenizers, schedulers, etc.).
- Return type:
dict[str, Any]
- get_diffusion_components(model, components=None)
Get all exportable components from a diffusion(-like) pipeline.
Supports: - diffusers DiffusionPipeline: returns pipeline.components - diffusers component nn.Module (e.g., UNet / transformer) - LTX-2 pipeline (duck-typed): returns stage-1 transformer only as stage_1_transformer
- Parameters:
model (Any) – The pipeline or component.
components (list[str] | None) – Optional list of component names to filter. If None, all components are returned.
- Returns:
Dictionary mapping component names to their instances (can be nn.Module, tokenizers, schedulers, etc.).
- Return type:
dict[str, Any]
- get_qkv_group_key(module_name)
Extract the parent attention block path and QKV type for grouping.
QKV projections should only be fused within the same attention block AND for the same type of attention (main vs added/cross).
Examples
‘transformer_blocks.0.attn.to_q’ -> ‘transformer_blocks.0.attn.main’
‘transformer_blocks.0.attn.to_k’ -> ‘transformer_blocks.0.attn.main’
‘transformer_blocks.5.attn.add_q_proj’ -> ‘transformer_blocks.5.attn.add’
‘transformer_blocks.5.attn.add_k_proj’ -> ‘transformer_blocks.5.attn.add’
- Parameters:
module_name (str) – The full module name path.
- Returns:
A string key representing the attention block and QKV type for grouping.
- Return type:
str
- hide_quantizers_from_state_dict(model)
Context manager that temporarily removes quantizer modules from the model.
This allows save_pretrained to save the model without quantizer buffers like _amax. The quantizers are restored after exiting the context.
- Parameters:
model (Module) – The model with quantizers to temporarily hide.
- Yields:
None - the model can be saved within the context.
- infer_dtype_from_model(model)
Infer the dtype from a model’s parameters.
- Parameters:
model (Module) – The model to infer dtype from.
- Returns:
The dtype of the model’s parameters, defaulting to float16 if no parameters found.
- Return type:
dtype
- is_diffusers_object(model)
Return True if model is a diffusers pipeline/component or LTX-2 pipeline.
- Parameters:
model (Any)
- Return type:
bool
- is_qkv_projection(module_name)
Check if a module name corresponds to a QKV projection layer.
In diffusers, QKV projections typically have names like: - to_q, to_k, to_v (most common in diffusers attention) - q_proj, k_proj, v_proj - query, key, value - add_q_proj, add_k_proj, add_v_proj (for additional attention in some models)
We exclude: - norm*.linear (AdaLayerNorm modulation layers) - proj_out, proj_mlp (output projections) - ff.*, mlp.* (feed-forward layers) - to_out (output projection)
- Parameters:
module_name (str) – The full module name path.
- Returns:
True if this is a QKV projection layer.
- Return type:
bool