unified_export_hf

Code that export quantized Hugging Face models for deployment.

Functions

export_hf_checkpoint

Export quantized HuggingFace model checkpoint (transformers or diffusers).

export_hf_checkpoint(model, dtype=None, export_dir='/tmp', save_modelopt_state=False, components=None)

Export quantized HuggingFace model checkpoint (transformers or diffusers).

This function automatically detects whether the model is from transformers or diffusers and applies the appropriate export logic.

Parameters:
  • model (Module | DiffusionPipeline) – The full torch model to export. The actual quantized model may be a submodule. Supports both transformers models (e.g., LlamaForCausalLM) and diffusers models/pipelines (e.g., StableDiffusionPipeline, UNet2DConditionModel).

  • dtype (dtype | None) – The weights data type to export the unquantized layers or the default model data type if None.

  • export_dir (Path | str) – The target export path.

  • save_modelopt_state (bool) – Whether to save the modelopt state_dict.

  • components (list[str] | None) – Only used for diffusers pipelines. Optional list of component names to export. If None, all quantized components are exported.