quantization_utils

Quantization utilities for LLM models.

Functions

`get_quant_config`	Get the quantization configuration.
`quantize`	Quantize the PyTorch model to fp8 or int4_awq.

get_quant_config(precision, lm_head_precision='fp16'): Get the quantization configuration.

quantize(model, tokenizer, precision, lm_head_precision='fp16', dataset_dir=None, calib_size=512): Quantize the PyTorch model to fp8 or int4_awq.