export

ONNX export utilities.

Classes

`FP8QuantExporter`	Exporter for FP8 quantization.
`INT4QuantExporter`	Exporter for INT4 quantization.
`INT8QuantExporter`	Exporter for INT8 quantization.
`MXFP8QuantExporter`	Exporter for MXFP8 quantization.
`NVFP4QuantExporter`	Exporter for NVFP4 quantization.
`ONNXQuantExporter`	Base class for ONNX quantizer exporters.

class FP8QuantExporter

Bases: ONNXQuantExporter

Exporter for FP8 quantization.

static compress_weights(onnx_model)

Compresses the weights in the ONNX model for FP8 quantization.

Parameters:: onnx_model (ModelProto)
Return type:: ModelProto

static compute_scales(onnx_model)

Computes the scales for the weights in the ONNX model for FP8 quantization.

Parameters:: onnx_model (ModelProto)
Return type:: ModelProto

static post_process(onnx_model)

Post-processes the ONNX model for FP8 quantization.

Parameters:: onnx_model (ModelProto)
Return type:: ModelProto

static pre_process(onnx_model)

Pre-processes the ONNX model for FP8 quantization.

Parameters:: onnx_model (ModelProto)
Return type:: ModelProto

class INT4QuantExporter

Bases: ONNXQuantExporter

Exporter for INT4 quantization.

static compress_weights(onnx_model)

Compresses the weights in the ONNX model for INT4 quantization.

Parameters:: onnx_model (ModelProto)
Return type:: ModelProto

static compute_scales(onnx_model)

Computes the scales for the weights in the ONNX model for INT4 quantization.

Parameters:: onnx_model (ModelProto)
Return type:: ModelProto

static post_process(onnx_model)

Post-processes the ONNX model for INT4 quantization.

Parameters:: onnx_model (ModelProto)
Return type:: ModelProto

static pre_process(onnx_model)

Pre-processes the ONNX model for INT4 quantization.

Parameters:: onnx_model (ModelProto)
Return type:: ModelProto

class INT8QuantExporter

Bases: ONNXQuantExporter

Exporter for INT8 quantization.

static compress_weights(onnx_model)

Compresses the weights in the ONNX model for INT8 quantization.

Parameters:: onnx_model (ModelProto)
Return type:: ModelProto

static compute_scales(onnx_model)

Computes the scales for the weights in the ONNX model for INT8 quantization.

Parameters:: onnx_model (ModelProto)
Return type:: ModelProto

static post_process(onnx_model)

Post-processes the ONNX model for INT8 quantization.

Parameters:: onnx_model (ModelProto)
Return type:: ModelProto

static pre_process(onnx_model)

Pre-processes the ONNX model for INT8 quantization.

Parameters:: onnx_model (ModelProto)
Return type:: ModelProto

class MXFP8QuantExporter

Bases: ONNXQuantExporter

Exporter for MXFP8 quantization.

static compress_weights(onnx_model)

Compresses the weights in the ONNX model for MXFP8 quantization.

Parameters:: onnx_model (ModelProto)
Return type:: ModelProto

static compute_scales(onnx_model)

Computes the scales for the weights in the ONNX model for MXFP8 quantization.

Parameters:: onnx_model (ModelProto)
Return type:: ModelProto

static post_process(onnx_model)

Post-processes the ONNX model for MXFP8 quantization.

Parameters:: onnx_model (ModelProto)
Return type:: ModelProto

static pre_process(onnx_model)

Pre-processes the ONNX model for MXFP8 quantization.

Parameters:: onnx_model (ModelProto)
Return type:: ModelProto

class NVFP4QuantExporter

Bases: ONNXQuantExporter

Exporter for NVFP4 quantization.

static compress_weights(onnx_model)

Compresses the weights in the ONNX model for NVFP4 quantization.

Parameters:: onnx_model (ModelProto)
Return type:: ModelProto

static compute_scales(onnx_model)

Computes the scales for the weights in the ONNX model for NVFP4 quantization.

Parameters:: onnx_model (ModelProto)
Return type:: ModelProto

static post_process(onnx_model)

Post-processes the ONNX model for NVFP4 quantization.

Parameters:: onnx_model (ModelProto)
Return type:: ModelProto

static pre_process(onnx_model)

Pre-processes the ONNX model for NVFP4 quantization.

Parameters:: onnx_model (ModelProto)
Return type:: ModelProto

class ONNXQuantExporter

Bases: ABC

Base class for ONNX quantizer exporters.

abstract static compress_weights(onnx_model)

Compresses the weights in the ONNX model.

Parameters:: onnx_model (ModelProto)
Return type:: ModelProto

abstract static compute_scales(onnx_model)

Computes the scales for the weights in the ONNX model.

Parameters:: onnx_model (ModelProto)
Return type:: ModelProto

abstract static post_process(onnx_model)

Post-processes the ONNX model.

Parameters:: onnx_model (ModelProto)
Return type:: ModelProto

abstract static pre_process(onnx_model)

Pre-processes the ONNX model. Converts all DQ -> * -> op patterns to DQ -> op.

Parameters:: onnx_model (ModelProto)
Return type:: ModelProto

classmethod process_model(onnx_model)

Processes the ONNX model.

Parameters:: onnx_model (ModelProto)
Return type:: ModelProto