convert

AutoCast module for converting ONNX models to mixed precision.

AutoCast is a tool for converting FP32 ONNX models to mixed precision FP32-FP16 or FP32-BF16 models. While casting FP32 to FP16/BF16, some nodes might be more sensitive to affecting accuracy. AutoCast intelligently selects nodes to keep in FP32 precision to maintain model accuracy while benefiting from reduced precision on the rest of the nodes. AutoCast automatically injects cast operations around the selected nodes.

Functions

`convert_to_f16`	Convert model to mixed precision, using PrecisionConverter.
`convert_to_mixed_precision`	Convert model to mixed precision.

convert_to_f16(model, low_precision_type='fp16', keep_io_types=True, op_block_list=[], tensor_block_dict={}, trt_plugins=[])

Convert model to mixed precision, using PrecisionConverter.

This method bypasses NodeClassifier, and uses a simple op_block_list.

Parameters:

model (ModelProto) – ONNX model to convert.
low_precision_type (str) – Target precision to reduce to (‘fp16’ or ‘bf16’).
keep_io_types (bool) – Whether to preserve input/output types.
op_block_list (list[str]) – List of operation types that should remain in FP32.
tensor_block_dict (dict[str, dict[str, list[int]]]) – Dictionary of tensors (operation type and I/O indices) that should remain in FP32.
trt_plugins (list[str] | None) – List of TensorRT plugin library paths in .so format (compiled shared library).

Return type:

ModelProto

convert_to_mixed_precision(onnx_path, low_precision_type='fp16', nodes_to_exclude=None, op_types_to_exclude=None, nodes_to_include=None, op_types_to_include=None, data_max=512, init_max=np.float16(6.55e+04), keep_io_types=False, calibration_data=None, custom_rule=None, init_conversion_max_bytes=None, providers=['cpu'], trt_plugins=[], trt_plugins_precision=[], max_depth_of_reduction=None, opset=None)

Convert model to mixed precision.

Parameters:

onnx_path (str) – Path to the input ONNX model.
low_precision_type (str) – Target precision to reduce to (‘fp16’ or ‘bf16’).
nodes_to_exclude (list[str] | None) – List of regex patterns to match node names that should remain in FP32.
op_types_to_exclude (list[str] | None) – List of operation types that should remain in FP32.
nodes_to_include (list[str] | None) – List of regex patterns to match node names that should be included in low precision.
op_types_to_include (list[str] | None) – List of operation types that should be included in low precision.
data_max (float) – Maximum absolute value for node input and output values.
init_max (float) – Maximum absolute value for initializers.
keep_io_types (bool) – Whether to preserve input/output types.
calibration_data (str | None) – Path to input data file for reference runner.
custom_rule (NodeRuleBase | None) – Optional custom rule for node classification (inherits from NodeRuleBase).
init_conversion_max_bytes (int | None) – Maximum size in bytes for initializer conversion. Larger initializers will be cast at runtime.
providers (list[str]) – List of ORT execution providers.
trt_plugins (list[str]) – List of TensorRT plugin library paths in .so format (compiled shared library).
trt_plugins_precision (list[str]) – List indicating the precision for each custom op.
max_depth_of_reduction (int | None) – Maximum depth of reduction for node classification.
opset (int | None) – Target ONNX opset version. If None, uses default minimum opset based on low_precision_type (22 for bf16, 13 for fp16). The opset may be automatically increased if certain operations require a higher version.

Returns:

The converted mixed precision model.

Return type:

onnx.ModelProto