ort_utils

Provides basic ORT inference utils, shoule be replaced by modelopt.torch.ort_client.

Functions

configure_ort

Configure and patches ORT to support ModelOpt ONNX quantization.

create_inference_session

Create an ORT InferenceSession.

get_quantizable_op_types

Returns a set of quantizable op types.

configure_ort(op_types, op_types_to_quantize, trt_extra_plugin_lib_paths=None, calibration_eps=None)

Configure and patches ORT to support ModelOpt ONNX quantization.

Parameters:
  • op_types (List[str]) –

  • op_types_to_quantize (List[str]) –

  • trt_extra_plugin_lib_paths (str) –

  • calibration_eps (List[str]) –

create_inference_session(onnx_path_or_model, calibration_eps)

Create an ORT InferenceSession.

Parameters:
  • onnx_path_or_model (str | bytes) –

  • calibration_eps (List[str]) –

get_quantizable_op_types(op_types_to_quantize)

Returns a set of quantizable op types.

Note. This function should be called after quantize._configure_ort() is called once. This returns quantizable op types either from the user supplied parameter or from modelopt.onnx’s default quantizable ops setting.

Parameters:

op_types_to_quantize (List[str]) –

Return type:

List[str]