unified_export_hf
Code that export quantized Hugging Face models for deployment.
Functions
Export quantized HuggingFace model checkpoint (transformers or diffusers). |
- export_hf_checkpoint(model, dtype=None, export_dir='/tmp', save_modelopt_state=False, components=None)
Export quantized HuggingFace model checkpoint (transformers or diffusers).
This function automatically detects whether the model is from transformers or diffusers and applies the appropriate export logic.
- Parameters:
model (Module | DiffusionPipeline) – The full torch model to export. The actual quantized model may be a submodule. Supports both transformers models (e.g., LlamaForCausalLM) and diffusers models/pipelines (e.g., StableDiffusionPipeline, UNet2DConditionModel).
dtype (dtype | None) – The weights data type to export the unquantized layers or the default model data type if None.
export_dir (Path | str) – The target export path.
save_modelopt_state (bool) – Whether to save the modelopt state_dict.
components (list[str] | None) – Only used for diffusers pipelines. Optional list of component names to export. If None, all quantized components are exported.