unified_export_hf
Code that export quantized Hugging Face models for deployment.
Functions
Exports the torch model to unified checkpoint and saves to export_dir. |
|
Exports the torch model to the packed checkpoint with original HF naming and save to the export_dir. |
- export_hf(model, dtype=torch.float16, export_dir='/tmp')
Exports the torch model to unified checkpoint and saves to export_dir.
- Parameters:
model (Module) – the torch model.
dtype (dtype) – the weights data type to export the unquantized layers.
export_dir (Path | str) – the target export path.
- export_hf_checkpoint(model, dtype=torch.float16, export_dir='/tmp')
Exports the torch model to the packed checkpoint with original HF naming and save to the export_dir.
The packed checkpoint will be consumed by the TensorRT-LLM unified converter.
- Parameters:
model (Module) – the torch model.
dtype (dtype) – the weights data type to export the unquantized layers.
export_dir (Path | str) – the target export path.
- Returns:
Dict containing quantized weights quant_config: config information to export hf_quant_cfg.json per_layer_quantization: Dict containing layerwise quantization information to export quant_cfg.json in mixed_precision case.
- Return type:
post_state_dict