int8
Performs INT8 quantization of an ONNX model, and returns the ONNX ModelProto.
Functions
Applies INT8 quantization to an ONNX file using the compiler friendly heuristics. |
- quantize(onnx_path, calibration_method='entropy', calibration_data_reader=None, calibration_cache_path=None, calibration_shapes=None, calibration_eps=['cuda:0', 'cpu', 'trt'], op_types_to_quantize=None, op_types_to_exclude=None, nodes_to_quantize=None, nodes_to_exclude=None, use_external_data_format=True, intermediate_generated_files=[], verbose=False, trt_extra_plugin_lib_paths=None, high_precision_dtype='fp32', **kwargs)
Applies INT8 quantization to an ONNX file using the compiler friendly heuristics.
Quantization of [‘Add’, ‘AveragePool’, ‘BatchNormalization’, ‘Clip’, ‘Conv’, ‘ConvTranspose’, ‘Gemm’, ‘GlobalAveragePool’, ‘MatMul’, ‘MaxPool’, ‘Mul’] op types are supported.
- Parameters:
onnx_path (str) –
calibration_method (str) –
calibration_data_reader (CalibrationDataReader) –
calibration_cache_path (str) –
calibration_shapes (str) –
calibration_eps (List[str]) –
op_types_to_quantize (List[str]) –
op_types_to_exclude (List[str]) –
nodes_to_quantize (List[str]) –
nodes_to_exclude (List[str]) –
use_external_data_format (bool) –
intermediate_generated_files (List[str]) –
verbose (bool) –
trt_extra_plugin_lib_paths (str) –
high_precision_dtype (str) –
- Return type:
ModelProto