quantization

Modules

modelopt.onnx.quantization.calib_utils

Provides basic calibration utils.

modelopt.onnx.quantization.extensions

Module to load C++ extensions.

modelopt.onnx.quantization.fp8

Performs FP8 GEMM only quantization of an ONNX model, and returns the ONNX ModelProto.

modelopt.onnx.quantization.graph_utils

Provides ONNX graph related utils for QDQ placement.

modelopt.onnx.quantization.gs_patching

Patches onnx_graphsurgeon to support explicitly setting a dtype.

modelopt.onnx.quantization.int4

Performs INT4 WoQ on an ONNX model, and returns the ONNX ModelProto.

modelopt.onnx.quantization.int8

Performs INT8 quantization of an ONNX model, and returns the ONNX ModelProto.

modelopt.onnx.quantization.operators

Additional or modified QDQ operators on top of ORT quantized operators.

modelopt.onnx.quantization.ort_patching

This module contains all the patched functions from ORT.

modelopt.onnx.quantization.ort_utils

Provides basic ORT inference utils, shoule be replaced by modelopt.torch.ort_client.

modelopt.onnx.quantization.partitioning

Utilities related to partitioning the ONNX model to place QDQ nodes.

modelopt.onnx.quantization.qdq_utils

Various utils to support inserting Q/DQ nodes.

modelopt.onnx.quantization.quant_utils

Provides some basic utilities that can be used in quantize() methods.

modelopt.onnx.quantization.quantize(onnx_path)

Quantizes the provided ONNX model.

modelopt.onnx.quantization.trt_utils

This module contains TensorRT utils.

Model optimization subpackage for onnx quantization.