ort_utils

Provides basic ORT inference utils, shoule be replaced by modelopt.torch.ort_client.

Functions

`configure_ort`	Configure and patches ORT to support ModelOpt ONNX quantization.
`create_inference_session`	Create an ORT InferenceSession.
`get_quantizable_op_types`	Returns a set of quantizable op types.
`update_trt_ep_support`	Checks whether TRT should be enabled or disabled and updates the list of calibration EPs accordingly.

configure_ort(op_types, op_types_to_quantize, trt_extra_plugin_lib_paths=None, calibration_eps=None, calibrate_per_node=False, custom_ops_to_quantize=[])

Configure and patches ORT to support ModelOpt ONNX quantization.

Parameters:

op_types (list[str])
op_types_to_quantize (list[str])
trt_extra_plugin_lib_paths (list[str] | None)
calibration_eps (list[str] | None)
calibrate_per_node (bool)
custom_ops_to_quantize (list[str])

create_inference_session(onnx_path_or_model, calibration_eps, input_shapes_profile=None)

Create an ORT InferenceSession.

Parameters:

onnx_path_or_model (str | bytes)
calibration_eps (list[str])
input_shapes_profile (Sequence[dict[str, str]] | None)

get_quantizable_op_types(op_types_to_quantize)

Returns a set of quantizable op types.

Note. This function should be called after quantize._configure_ort() is called once. This returns quantizable op types either from the user supplied parameter or from modelopt.onnx’s default quantizable ops setting.

Parameters:: op_types_to_quantize (list[str])
Return type:: list[str]

update_trt_ep_support(calibration_eps, has_dds_op, has_custom_op, trt_plugins)

Checks whether TRT should be enabled or disabled and updates the list of calibration EPs accordingly.

Parameters:

calibration_eps (list[str])
has_dds_op (bool)
has_custom_op (bool)
trt_plugins (list[str])