export_onnx

Utility to export a quantized torch model to quantized ONNX.

Functions

export_fp8

Export quantized model to FP8 ONNX.

export_int8

Export quantized model to INT8 ONNX.

export_fp8(g, inputs, amax, trt_high_precision_dtype)

Export quantized model to FP8 ONNX.

Parameters:
  • g (GraphContext) –

  • inputs (Value) –

  • amax (float) –

  • trt_high_precision_dtype (str) –

export_int8(g, inputs, amax, num_bits, unsigned, narrow_range, trt_high_precision_dtype)

Export quantized model to INT8 ONNX.

Parameters:
  • g (GraphContext) –

  • inputs (Value) –

  • amax (Tensor) –

  • num_bits (int) –

  • unsigned (bool) –

  • narrow_range (bool) –

  • trt_high_precision_dtype (str) –