ort_utils
Provides basic ORT inference utils, shoule be replaced by modelopt.torch.ort_client.
Functions
Configure and patches ORT to support ModelOpt ONNX quantization. |
|
Create an ORT InferenceSession. |
|
Returns a set of quantizable op types. |
|
Checks whether TRT should be enabled or disabled and updates the list of calibration EPs accordingly. |
- configure_ort(op_types, op_types_to_quantize, trt_extra_plugin_lib_paths=None, calibration_eps=None)
Configure and patches ORT to support ModelOpt ONNX quantization.
- Parameters:
op_types (list[str]) –
op_types_to_quantize (list[str]) –
trt_extra_plugin_lib_paths (str) –
calibration_eps (list[str]) –
- create_inference_session(onnx_path_or_model, calibration_eps)
Create an ORT InferenceSession.
- Parameters:
onnx_path_or_model (str | bytes) –
calibration_eps (list[str]) –
- get_quantizable_op_types(op_types_to_quantize)
Returns a set of quantizable op types.
Note. This function should be called after quantize._configure_ort() is called once. This returns quantizable op types either from the user supplied parameter or from modelopt.onnx’s default quantizable ops setting.
- Parameters:
op_types_to_quantize (list[str]) –
- Return type:
list[str]
- update_trt_ep_support(calibration_eps, has_dds_op, has_custom_op, trt_plugins)
Checks whether TRT should be enabled or disabled and updates the list of calibration EPs accordingly.
- Parameters:
calibration_eps (list[str]) –
has_dds_op (bool) –
has_custom_op (bool) –
trt_plugins (str) –