export

ONNX export utilities.

Classes

FP8QuantExporter

Exporter for FP8 quantization.

INT4QuantExporter

Exporter for INT4 quantization.

INT8QuantExporter

Exporter for INT8 quantization.

MXFP8QuantExporter

Exporter for MXFP8 quantization.

NVFP4QuantExporter

Exporter for NVFP4 quantization.

ONNXQuantExporter

Base class for ONNX quantizer exporters.

class FP8QuantExporter

Bases: ONNXQuantExporter

Exporter for FP8 quantization.

static compress_weights(onnx_model)

Compresses the weights in the ONNX model for FP8 quantization.

Parameters:

onnx_model (ModelProto)

Return type:

ModelProto

static compute_scales(onnx_model)

Computes the scales for the weights in the ONNX model for FP8 quantization.

Parameters:

onnx_model (ModelProto)

Return type:

ModelProto

static post_process(onnx_model)

Post-processes the ONNX model for FP8 quantization.

Parameters:

onnx_model (ModelProto)

Return type:

ModelProto

static pre_process(onnx_model)

Pre-processes the ONNX model for FP8 quantization.

Parameters:

onnx_model (ModelProto)

Return type:

ModelProto

class INT4QuantExporter

Bases: ONNXQuantExporter

Exporter for INT4 quantization.

static compress_weights(onnx_model)

Compresses the weights in the ONNX model for INT4 quantization.

Parameters:

onnx_model (ModelProto)

Return type:

ModelProto

static compute_scales(onnx_model)

Computes the scales for the weights in the ONNX model for INT4 quantization.

Parameters:

onnx_model (ModelProto)

Return type:

ModelProto

static post_process(onnx_model)

Post-processes the ONNX model for INT4 quantization.

Parameters:

onnx_model (ModelProto)

Return type:

ModelProto

static pre_process(onnx_model)

Pre-processes the ONNX model for INT4 quantization.

Parameters:

onnx_model (ModelProto)

Return type:

ModelProto

class INT8QuantExporter

Bases: ONNXQuantExporter

Exporter for INT8 quantization.

static compress_weights(onnx_model)

Compresses the weights in the ONNX model for INT8 quantization.

Parameters:

onnx_model (ModelProto)

Return type:

ModelProto

static compute_scales(onnx_model)

Computes the scales for the weights in the ONNX model for INT8 quantization.

Parameters:

onnx_model (ModelProto)

Return type:

ModelProto

static post_process(onnx_model)

Post-processes the ONNX model for INT8 quantization.

Parameters:

onnx_model (ModelProto)

Return type:

ModelProto

static pre_process(onnx_model)

Pre-processes the ONNX model for INT8 quantization.

Parameters:

onnx_model (ModelProto)

Return type:

ModelProto

class MXFP8QuantExporter

Bases: ONNXQuantExporter

Exporter for MXFP8 quantization.

static compress_weights(onnx_model)

Compresses the weights in the ONNX model for MXFP8 quantization.

Parameters:

onnx_model (ModelProto)

Return type:

ModelProto

static compute_scales(onnx_model)

Computes the scales for the weights in the ONNX model for MXFP8 quantization.

Parameters:

onnx_model (ModelProto)

Return type:

ModelProto

static post_process(onnx_model)

Post-processes the ONNX model for MXFP8 quantization.

Parameters:

onnx_model (ModelProto)

Return type:

ModelProto

static pre_process(onnx_model)

Pre-processes the ONNX model for MXFP8 quantization.

Parameters:

onnx_model (ModelProto)

Return type:

ModelProto

class NVFP4QuantExporter

Bases: ONNXQuantExporter

Exporter for NVFP4 quantization.

static compress_weights(onnx_model)

Compresses the weights in the ONNX model for NVFP4 quantization.

Parameters:

onnx_model (ModelProto)

Return type:

ModelProto

static compute_scales(onnx_model)

Computes the scales for the weights in the ONNX model for NVFP4 quantization.

Parameters:

onnx_model (ModelProto)

Return type:

ModelProto

static post_process(onnx_model)

Post-processes the ONNX model for NVFP4 quantization.

Parameters:

onnx_model (ModelProto)

Return type:

ModelProto

static pre_process(onnx_model)

Pre-processes the ONNX model for NVFP4 quantization.

Parameters:

onnx_model (ModelProto)

Return type:

ModelProto

class ONNXQuantExporter

Bases: ABC

Base class for ONNX quantizer exporters.

abstract static compress_weights(onnx_model)

Compresses the weights in the ONNX model.

Parameters:

onnx_model (ModelProto)

Return type:

ModelProto

abstract static compute_scales(onnx_model)

Computes the scales for the weights in the ONNX model.

Parameters:

onnx_model (ModelProto)

Return type:

ModelProto

abstract static post_process(onnx_model)

Post-processes the ONNX model.

Parameters:

onnx_model (ModelProto)

Return type:

ModelProto

abstract static pre_process(onnx_model)

Pre-processes the ONNX model. Converts all DQ -> * -> op patterns to DQ -> op.

Parameters:

onnx_model (ModelProto)

Return type:

ModelProto

classmethod process_model(onnx_model)

Processes the ONNX model.

Parameters:

onnx_model (ModelProto)

Return type:

ModelProto