export
ONNX export utilities.
Classes
Exporter for FP8 quantization. |
|
Exporter for INT4 quantization. |
|
Exporter for INT8 quantization. |
|
Exporter for MXFP8 quantization. |
|
Exporter for NVFP4 quantization. |
|
Base class for ONNX quantizer exporters. |
- class FP8QuantExporter
Bases:
ONNXQuantExporterExporter for FP8 quantization.
- static compress_weights(onnx_model)
Compresses the weights in the ONNX model for FP8 quantization.
- Parameters:
onnx_model (ModelProto)
- Return type:
ModelProto
- static compute_scales(onnx_model)
Computes the scales for the weights in the ONNX model for FP8 quantization.
- Parameters:
onnx_model (ModelProto)
- Return type:
ModelProto
- static post_process(onnx_model)
Post-processes the ONNX model for FP8 quantization.
- Parameters:
onnx_model (ModelProto)
- Return type:
ModelProto
- static pre_process(onnx_model)
Pre-processes the ONNX model for FP8 quantization.
- Parameters:
onnx_model (ModelProto)
- Return type:
ModelProto
- class INT4QuantExporter
Bases:
ONNXQuantExporterExporter for INT4 quantization.
- static compress_weights(onnx_model)
Compresses the weights in the ONNX model for INT4 quantization.
- Parameters:
onnx_model (ModelProto)
- Return type:
ModelProto
- static compute_scales(onnx_model)
Computes the scales for the weights in the ONNX model for INT4 quantization.
- Parameters:
onnx_model (ModelProto)
- Return type:
ModelProto
- static post_process(onnx_model)
Post-processes the ONNX model for INT4 quantization.
- Parameters:
onnx_model (ModelProto)
- Return type:
ModelProto
- static pre_process(onnx_model)
Pre-processes the ONNX model for INT4 quantization.
- Parameters:
onnx_model (ModelProto)
- Return type:
ModelProto
- class INT8QuantExporter
Bases:
ONNXQuantExporterExporter for INT8 quantization.
- static compress_weights(onnx_model)
Compresses the weights in the ONNX model for INT8 quantization.
- Parameters:
onnx_model (ModelProto)
- Return type:
ModelProto
- static compute_scales(onnx_model)
Computes the scales for the weights in the ONNX model for INT8 quantization.
- Parameters:
onnx_model (ModelProto)
- Return type:
ModelProto
- static post_process(onnx_model)
Post-processes the ONNX model for INT8 quantization.
- Parameters:
onnx_model (ModelProto)
- Return type:
ModelProto
- static pre_process(onnx_model)
Pre-processes the ONNX model for INT8 quantization.
- Parameters:
onnx_model (ModelProto)
- Return type:
ModelProto
- class MXFP8QuantExporter
Bases:
ONNXQuantExporterExporter for MXFP8 quantization.
- static compress_weights(onnx_model)
Compresses the weights in the ONNX model for MXFP8 quantization.
- Parameters:
onnx_model (ModelProto)
- Return type:
ModelProto
- static compute_scales(onnx_model)
Computes the scales for the weights in the ONNX model for MXFP8 quantization.
- Parameters:
onnx_model (ModelProto)
- Return type:
ModelProto
- static post_process(onnx_model)
Post-processes the ONNX model for MXFP8 quantization.
- Parameters:
onnx_model (ModelProto)
- Return type:
ModelProto
- static pre_process(onnx_model)
Pre-processes the ONNX model for MXFP8 quantization.
- Parameters:
onnx_model (ModelProto)
- Return type:
ModelProto
- class NVFP4QuantExporter
Bases:
ONNXQuantExporterExporter for NVFP4 quantization.
- static compress_weights(onnx_model)
Compresses the weights in the ONNX model for NVFP4 quantization.
- Parameters:
onnx_model (ModelProto)
- Return type:
ModelProto
- static compute_scales(onnx_model)
Computes the scales for the weights in the ONNX model for NVFP4 quantization.
- Parameters:
onnx_model (ModelProto)
- Return type:
ModelProto
- static post_process(onnx_model)
Post-processes the ONNX model for NVFP4 quantization.
- Parameters:
onnx_model (ModelProto)
- Return type:
ModelProto
- static pre_process(onnx_model)
Pre-processes the ONNX model for NVFP4 quantization.
- Parameters:
onnx_model (ModelProto)
- Return type:
ModelProto
- class ONNXQuantExporter
Bases:
ABCBase class for ONNX quantizer exporters.
- abstract static compress_weights(onnx_model)
Compresses the weights in the ONNX model.
- Parameters:
onnx_model (ModelProto)
- Return type:
ModelProto
- abstract static compute_scales(onnx_model)
Computes the scales for the weights in the ONNX model.
- Parameters:
onnx_model (ModelProto)
- Return type:
ModelProto
- abstract static post_process(onnx_model)
Post-processes the ONNX model.
- Parameters:
onnx_model (ModelProto)
- Return type:
ModelProto
- abstract static pre_process(onnx_model)
Pre-processes the ONNX model. Converts all DQ -> * -> op patterns to DQ -> op.
- Parameters:
onnx_model (ModelProto)
- Return type:
ModelProto
- classmethod process_model(onnx_model)
Processes the ONNX model.
- Parameters:
onnx_model (ModelProto)
- Return type:
ModelProto