quantization
Modules
Provides basic calibration utils. |
|
Module to load C++ extensions. |
|
Performs FP8 GEMM only quantization of an ONNX model, and returns the ONNX ModelProto. |
|
Provides ONNX graph related utils for QDQ placement. |
|
Patches onnx_graphsurgeon to support explicitly setting a dtype. |
|
Performs INT4 WoQ on an ONNX model, and returns the ONNX ModelProto. |
|
Performs INT8 quantization of an ONNX model, and returns the ONNX ModelProto. |
|
Additional or modified QDQ operators on top of ORT quantized operators. |
|
This module contains all the patched functions from ORT. |
|
Provides basic ORT inference utils, shoule be replaced by modelopt.torch.ort_client. |
|
Utilities related to partitioning the ONNX model to place QDQ nodes. |
|
Various utils to support inserting Q/DQ nodes. |
|
Provides some basic utilities that can be used in quantize() methods. |
|
|
Quantizes the provided ONNX model. |
This module contains TensorRT utils. |
Model optimization subpackage for onnx quantization.