int8

Performs INT8 quantization of an ONNX model, and returns the ONNX ModelProto.

Functions

quantize

Applies INT8 quantization to an ONNX file using the compiler friendly heuristics.

quantize(onnx_path, calibration_method='entropy', calibration_data_reader=None, calibration_cache_path=None, calibration_shapes=None, calibration_eps=['cpu', 'cuda:0', 'trt'], op_types_to_quantize=None, op_types_to_exclude=None, nodes_to_quantize=None, nodes_to_exclude=None, use_external_data_format=False, intermediate_generated_files=[], trt_extra_plugin_lib_paths=None, high_precision_dtype='fp32', passes=['concat_elimination'], **kwargs)

Applies INT8 quantization to an ONNX file using the compiler friendly heuristics.

Quantization of [‘Add’, ‘AveragePool’, ‘BatchNormalization’, ‘Clip’, ‘Conv’, ‘ConvTranspose’, ‘Gemm’, ‘GlobalAveragePool’, ‘MatMul’, ‘MaxPool’, ‘Mul’] op types are supported.

Parameters:
  • onnx_path (str)

  • calibration_method (str)

  • calibration_data_reader (CalibrationDataReader)

  • calibration_cache_path (str | None)

  • calibration_shapes (str | None)

  • calibration_eps (list[str])

  • op_types_to_quantize (list[str] | None)

  • op_types_to_exclude (list[str] | None)

  • nodes_to_quantize (list[str] | None)

  • nodes_to_exclude (list[str] | None)

  • use_external_data_format (bool)

  • intermediate_generated_files (list[str])

  • trt_extra_plugin_lib_paths (str | None)

  • high_precision_dtype (str)

  • passes (list[str])

Return type:

ModelProto