ort_utils
Provides basic ORT inference utils, shoule be replaced by modelopt.torch.ort_client.
Functions
Configure and patches ORT to support ModelOpt ONNX quantization. |
|
Create an ORT InferenceSession. |
|
Returns a set of quantizable op types. |
- configure_ort(op_types, op_types_to_quantize, trt_extra_plugin_lib_paths=None, calibration_eps=None)
Configure and patches ORT to support ModelOpt ONNX quantization.
- Parameters:
op_types (List[str]) –
op_types_to_quantize (List[str]) –
trt_extra_plugin_lib_paths (str) –
calibration_eps (List[str]) –
- create_inference_session(onnx_path_or_model, calibration_eps)
Create an ORT InferenceSession.
- Parameters:
onnx_path_or_model (str | bytes) –
calibration_eps (List[str]) –
- get_quantizable_op_types(op_types_to_quantize)
Returns a set of quantizable op types.
Note. This function should be called after quantize._configure_ort() is called once. This returns quantizable op types either from the user supplied parameter or from modelopt.onnx’s default quantizable ops setting.
- Parameters:
op_types_to_quantize (List[str]) –
- Return type:
List[str]