unified_export_hf

Code that export quantized Hugging Face models for deployment.

Functions

export_hf

Exports the torch model to unified checkpoint and saves to export_dir.

export_hf_checkpoint

Exports the torch model to the packed checkpoint with original HF naming and save to the export_dir.

export_hf(model, dtype=torch.float16, export_dir='/tmp')

Exports the torch model to unified checkpoint and saves to export_dir.

Parameters:
  • model (Module) – the torch model.

  • dtype (dtype) – the weights data type to export the unquantized layers.

  • export_dir (Path | str) – the target export path.

export_hf_checkpoint(model, dtype=torch.float16, export_dir='/tmp')

Exports the torch model to the packed checkpoint with original HF naming and save to the export_dir.

Parameters:
  • model (Module) – the torch model.

  • dtype (dtype) – the weights data type to export the unquantized layers.

  • export_dir (Path | str) – the target export path.

Return type:

Tuple[Dict[str, Any], Dict[str, Any]]

The packed checkpoint will be consumed by the TensorRT-LLM unified converter.