quantization_utils
Quantization utilities for LLM models.
Functions
| Get the quantization configuration. | |
| Quantize the PyTorch model to fp8 or int4_awq. | 
- get_quant_config(precision, lm_head_precision='fp16')
- Get the quantization configuration. 
- quantize(model, tokenizer, precision, lm_head_precision='fp16', dataset_dir=None, calib_size=512)
- Quantize the PyTorch model to fp8 or int4_awq.