convert

AutoCast module for converting ONNX models to mixed precision.

AutoCast is a tool for converting FP32 ONNX models to mixed precision FP32-FP16 or FP32-BF16 models. While casting FP32 to FP6/BF16, some nodes might be more sensitive to effecting accuracy. AutoCast intelligently selects nodes to keep in FP32 precision to maintain model accuracy while benefiting from reduced precision on the rest of the nodes. AutoCast automatically injects cast operations around the selected nodes.

Functions

convert_to_f16

Convert model to mixed precision, using PrecisionConverter.

convert_to_mixed_precision

Convert model to mixed precision.

convert_to_f16(model, low_precision_type='fp16', keep_io_types=True, op_block_list=[], trt_plugins=[])

Convert model to mixed precision, using PrecisionConverter.

This method bypasses NodeClassifier, and uses a simple op_block_list.

Parameters:
  • model (ModelProto) – ONNX model to convert.

  • low_precision_type (str) – Target precision to reduce to (‘fp16’ or ‘bf16’).

  • keep_io_types (bool) – Whether to preserve input/output types.

  • disable_shape_infer – Whether to disable shape inference.

  • op_block_list (list[str]) – List of operation types that should remain in FP32.

  • trt_plugins (list[str] | None) – List of TensorRT plugin library paths in .so format (compiled shared library).

Return type:

ModelProto

convert_to_mixed_precision(onnx_path, low_precision_type='fp16', nodes_to_exclude=None, op_types_to_exclude=None, data_max=512, init_max=np.float16(6.55e+04), keep_io_types=False, calibration_data=None, custom_rule=None, init_conversion_max_bytes=None, providers=['cpu'], trt_plugins=[], max_depth_of_reduction=None)

Convert model to mixed precision.

Parameters:
  • onnx_path (str) – Path to the input ONNX model.

  • low_precision_type (str) – Target precision to reduce to (‘fp16’ or ‘bf16’).

  • nodes_to_exclude (list[str] | None) – List of regex patterns to match node names that should remain in FP32.

  • op_types_to_exclude (list[str] | None) – List of operation types that should remain in FP32.

  • data_max (float) – Maximum absolute value for node input and output values.

  • init_max (float) – Maximum absolute value for initializers.

  • keep_io_types (bool) – Whether to preserve input/output types.

  • calibration_data (str | None) – Path to input data file for reference runner.

  • custom_rule (NodeRuleBase | None) – Optional custom rule for node classification (inherits from NodeRuleBase).

  • init_conversion_max_bytes (int | None) – Maximum size in bytes for initializer conversion. Larger initializers will be cast at runtime.

  • providers (list[str]) – List of ORT execution providers.

  • trt_plugins (list[str]) – List of TensorRT plugin library paths in .so format (compiled shared library).

  • max_depth_of_reduction (int | None) – Maximum depth of reduction for node classification.

Returns:

The converted mixed precision model.

Return type:

onnx.ModelProto