Quantization

class tensorrt_llm.quantization.QuantAlgo(value)[source]

Bases: StrEnum

An enumeration.

class tensorrt_llm.quantization.QuantMode(value)[source]

Bases: IntFlag

An enumeration.

tensorrt_llm.quantization.quantize_and_export(*, model_dir, dtype, device, qformat, kv_cache_dtype, calib_size, batch_size, awq_block_size, output_dir, tp_size, pp_size, seed, max_seq_length)[source]

Load model from the model_dir, call AMMO to quantize the model, and then export the quantized model as TRT-LLM checkpoint