========================= Quick Start: Quantization ========================= Quantization ------------ Quantization is an effective technique to reduce the memory footprint of deep learning models and to accelerate the inference speed. ModelOpt's :meth:`mtq.quantize() ` API enables users to quantize a model with advanced algorithms like SmoothQuant, AWQ, and more. ModelOpt supports both Post Training Quantization (PTQ) and Quantization Aware Training (QAT). .. tip:: Please refer to :any:`quantization-formats` for details on the ModelOpt supported quantization formats and their use-cases. PTQ for PyTorch models ----------------------------- :meth:`mtq.quantize ` requires the model, the appropriate quantization configuration, and a forward loop as inputs. Here is a quick example of quantizing a model with int8 SmoothQuant using :meth:`mtq.quantize `: .. code-block:: python import modelopt.torch.quantization as mtq # Setup the model model = get_model() # The quantization algorithm requires calibration data. Below we show a rough example of how to # set up a calibration data loader with the desired calib_size data_loader = get_dataloader(num_samples=calib_size) # Define the forward_loop function with the model as input. The data loader should be wrapped # inside the function. def forward_loop(model): for batch in data_loader: model(batch) # Quantize the model and perform calibration (PTQ) model = mtq.quantize(model, mtq.INT8_SMOOTHQUANT_CFG, forward_loop) Refer to :any:`quantization-configs` for the quantization configurations available from ModelOpt. Deployment ---------------- The quantized model is just like a regular Pytorch model and is ready for evaluation or deployment. Huggingface or Nemo LLM models can be exported to TensorRT-LLM using ModelOpt. Please see the :doc:`TensorRT-LLM Deployment <../deployment/1_tensorrt_llm>` guide for more details. The model can be also exported to ONNX using `torch.onnx.export `_. -------------------------------- **Next Steps** * Learn more about quantization and advanced usage of Model Optimizer quantization in :doc:`Quantization guide <../guides/1_quantization>`. * Checkout out the end-to-end examples on GitHub for PTQ and QAT `here `_.