quantization_utils
Quantization utilities for LLM models.
Functions
Get the quantization configuration. |
|
Quantize the PyTorch model to fp8 or int4_awq. |
- get_quant_config(precision, lm_head_precision='fp16')
Get the quantization configuration.
- quantize(model, tokenizer, precision, lm_head_precision='fp16', dataset_dir=None, calib_size=512)
Quantize the PyTorch model to fp8 or int4_awq.