Quantization
- tensorrt_llm.quantization.quantize_and_export(*, model_dir, dtype, device, qformat, kv_cache_dtype, calib_size, batch_size, awq_block_size, output_dir, tp_size, pp_size, seed, max_seq_length)[source]
Load model from the model_dir, call AMMO to quantize the model, and then export the quantized model as TRT-LLM checkpoint