qdq_utils

Various utils to support inserting Q/DQ nodes.

Functions

insert_dq_nodes

Insert new initializers and DQ nodes into graph.

insert_pre_quant_scale_nodes

Insert new mul nodes into graph.

insert_qdq_nodes

Insert scales and QDQ nodes into graph.

make_gs_awq_scale

Create a GraphSurgeon scale tensor from the given numpy array.

make_gs_dequantize_node

Create a GraphSurgeon Dequantize node.

make_gs_dequantize_output

Create a GraphSurgeon variable representing the output of a quantize node.

make_gs_pre_quant_scale_node

Create a GraphSurgeon Dequantize node.

make_gs_pre_quant_scale_output

Create a GraphSurgeon variable representing the output of a quantize node.

make_gs_quantize_node

Create a GraphSurgeon Quantize node.

make_gs_quantize_output

Create a GraphSurgeon variable representing the output of a quantize node.

make_gs_quantized_weight

Create a GraphSurgeon tensor from a quantized weight tensor.

make_gs_scale

Create a GraphSurgeon scale tensor from the given numpy array.

make_gs_zp

Create a GraphSurgeon zero-point tensor of all zeroes with the given shape.

qdq_to_dq

Convert FP32/FP16 weights of the given ONNX model to INT8/FP8 weights.

replace_scale_values

Replaces the scales values from calibration cache.

use_trt_qdq_ops

Globally set node names to TRT custom names.

insert_dq_nodes(graph, scales, quantized_weights, attributes=None, zero_points=None)

Insert new initializers and DQ nodes into graph.

Parameters:
  • graph (Graph) – The graph to modify.

  • weights – A map from ONNX initializer name to tensor.

  • scales (Dict[str, ndarray]) – A map from ONNX initializer name to desired scale factor for that initializer.

  • dq_only – Whether to only insert dq nodes.

  • quantized_weights (Dict[str, ndarray]) –

  • attributes (Dict[str, Any]) –

  • zero_points (Dict[str, ndarray] | None) –

insert_pre_quant_scale_nodes(graph, input_tensors, pre_quant_scale)

Insert new mul nodes into graph.

Parameters:
  • graph (Graph) – The graph to modify.

  • input_tensors (Dict[str, str]) – A dictionary of weight tensor names mapped to corresponding input tensor names

  • pre_quant_scale (Dict[str, ndarray]) – A map from ONNX input tensor name to corresponding pre-quant scale.

insert_qdq_nodes(graph, scales, weight_map)

Insert scales and QDQ nodes into graph.

Parameters:
  • graph (Graph) – The graph to modify.

  • scales (Dict[str, ndarray]) – A map from ONNX initializer name to desired scale factor for that initializer.

  • weight_map (Dict[str, Tensor]) – A map from ONNX initializer name to graphsurgeon tensor.

make_gs_awq_scale(name, scale)

Create a GraphSurgeon scale tensor from the given numpy array.

name is the desired _basename_ of the tensor.

Parameters:
  • name (str) –

  • scale (ndarray) –

Return type:

Constant

make_gs_dequantize_node(name, inputs, outputs, attributes=None)

Create a GraphSurgeon Dequantize node.

name is the desired _basename_ of the node.

Parameters:
  • name (str) –

  • inputs (Sequence[Tensor]) –

  • outputs (Sequence[Tensor]) –

  • attributes (Dict[str, Any]) –

Return type:

Node

make_gs_dequantize_output(name, shape, dtype)

Create a GraphSurgeon variable representing the output of a quantize node.

name is the desired _basename_ of the node.

Parameters:
  • name (str) –

  • shape (Sequence[int]) –

  • dtype (dtype) –

Return type:

Variable

make_gs_pre_quant_scale_node(name, inputs, outputs)

Create a GraphSurgeon Dequantize node.

name is the desired _basename_ of the node.

Parameters:
  • name (str) –

  • inputs (Sequence[Tensor]) –

  • outputs (Sequence[Tensor]) –

Return type:

Node

make_gs_pre_quant_scale_output(name, shape, dtype)

Create a GraphSurgeon variable representing the output of a quantize node.

name is the desired _basename_ of the node.

Parameters:
  • name (str) –

  • shape (Sequence[int]) –

  • dtype (dtype) –

Return type:

Variable

make_gs_quantize_node(name, inputs, outputs)

Create a GraphSurgeon Quantize node.

name is the desired _basename_ of the node.

Parameters:
  • name (str) –

  • inputs (Sequence[Tensor]) –

  • outputs (Sequence[Tensor]) –

Return type:

Node

make_gs_quantize_output(name, shape, dtype)

Create a GraphSurgeon variable representing the output of a quantize node.

name is the desired _basename_ of the node.

Parameters:
  • name (str) –

  • shape (Sequence[int]) –

  • dtype (<google.protobuf.internal.enum_type_wrapper.EnumTypeWrapper object at 0x7f4d2fb110d0>) –

Return type:

Variable

make_gs_quantized_weight(name, wq, dtype)

Create a GraphSurgeon tensor from a quantized weight tensor.

name is the desired _basename_ of the tensor.

Parameters:
  • name (str) –

  • wq (ndarray) –

Return type:

Constant

make_gs_scale(name, scale)

Create a GraphSurgeon scale tensor from the given numpy array.

name is the desired _basename_ of the tensor.

Parameters:
  • name (str) –

  • scale (ndarray) –

Return type:

Constant

make_gs_zp(name, shape, dtype)

Create a GraphSurgeon zero-point tensor of all zeroes with the given shape.

name is the desired _basename_ of the tensor.

Parameters:
  • name (str) –

  • shape (Sequence[int]) –

Return type:

Constant

qdq_to_dq(onnx_model, verbose=False)

Convert FP32/FP16 weights of the given ONNX model to INT8/FP8 weights.

Q nodes will get removed from the weights and have only DQ nodes with those converted INT8/FP8 weights in the output model. Also dangling Q nodes get fused and update its consumer’s weight.

Parameters:
  • onnx_model (ModelProto) – ONNX model protobuf.

  • verbose (bool) –

Returns:

ONNX model protobuf with only DQ nodes for weights and QDQ nodes for activations.

Return type:

ModelProto

replace_scale_values(graph, act_scales_dict)

Replaces the scales values from calibration cache.

Parameters:
  • graph (GraphProto) –

  • act_scales_dict (Dict[str, float]) –

use_trt_qdq_ops()

Globally set node names to TRT custom names.