ort_utils

Provides basic ORT inference utils, shoule be replaced by modelopt.torch.ort_client.

Functions

configure_ort

Configure and patches ORT to support ModelOpt ONNX quantization.

create_inference_session

Create an ORT InferenceSession.

get_quantizable_op_types

Returns a set of quantizable op types.

update_trt_ep_support

Checks whether TRT should be enabled or disabled and updates the list of calibration EPs accordingly.

configure_ort(op_types, op_types_to_quantize, trt_extra_plugin_lib_paths=None, calibration_eps=None, calibrate_per_node=False, custom_ops_to_quantize=[])

Configure and patches ORT to support ModelOpt ONNX quantization.

Parameters:
  • op_types (list[str])

  • op_types_to_quantize (list[str])

  • trt_extra_plugin_lib_paths (list[str] | None)

  • calibration_eps (list[str] | None)

  • calibrate_per_node (bool)

  • custom_ops_to_quantize (list[str])

create_inference_session(onnx_path_or_model, calibration_eps, input_shapes_profile=None)

Create an ORT InferenceSession.

Parameters:
  • onnx_path_or_model (str | bytes)

  • calibration_eps (list[str])

  • input_shapes_profile (Sequence[dict[str, str]] | None)

get_quantizable_op_types(op_types_to_quantize)

Returns a set of quantizable op types.

Note. This function should be called after quantize._configure_ort() is called once. This returns quantizable op types either from the user supplied parameter or from modelopt.onnx’s default quantizable ops setting.

Parameters:

op_types_to_quantize (list[str])

Return type:

list[str]

update_trt_ep_support(calibration_eps, has_dds_op, has_custom_op, trt_plugins)

Checks whether TRT should be enabled or disabled and updates the list of calibration EPs accordingly.

Parameters:
  • calibration_eps (list[str])

  • has_dds_op (bool)

  • has_custom_op (bool)

  • trt_plugins (list[str])