quantization_utils

Quantization utilities for LLM models.

Functions

get_quant_config

Get the quantization configuration.

quantize

Quantize the PyTorch model to fp8 or int4_awq.

get_quant_config(precision, lm_head_precision='fp16')

Get the quantization configuration.

quantize(model, tokenizer, precision, lm_head_precision='fp16', dataset_dir=None, calib_size=512)

Quantize the PyTorch model to fp8 or int4_awq.