qdq_utils

Various utils to support inserting Q/DQ nodes.

Functions

`insert_dq_nodes`	Insert new initializers and DQ nodes into graph.
`insert_pre_quant_scale_nodes`	Insert new mul nodes into graph.
`insert_qdq_nodes`	Insert scales and QDQ nodes into graph.
`make_gs_awq_scale`	Create a GraphSurgeon scale tensor from the given numpy array.
`make_gs_dequantize_node`	Create a GraphSurgeon Dequantize node.
`make_gs_dequantize_output`	Create a GraphSurgeon variable representing the output of a quantize node.
`make_gs_pre_quant_scale_node`	Create a GraphSurgeon Dequantize node.
`make_gs_pre_quant_scale_output`	Create a GraphSurgeon variable representing the output of a quantize node.
`make_gs_quantize_node`	Create a GraphSurgeon Quantize node.
`make_gs_quantize_output`	Create a GraphSurgeon variable representing the output of a quantize node.
`make_gs_quantized_weight`	Create a GraphSurgeon tensor from a quantized weight tensor.
`make_gs_scale`	Create a GraphSurgeon scale tensor from the given numpy array.
`make_gs_zp`	Create a GraphSurgeon zero-point tensor of all zeroes with the given shape.
`qdq_to_dq`	Convert FP32/FP16 weights of the given ONNX model to INT8/FP8 weights.
`replace_scale_values`	Replaces the scales values from calibration cache.
`use_trt_qdq_ops`	Globally set node names to TRT custom names.

insert_dq_nodes(graph, scales, quantized_weights, attributes=None, zero_points=None)

Insert new initializers and DQ nodes into graph.

Parameters:

graph (Graph) – The graph to modify.
weights – A map from ONNX initializer name to tensor.
scales (Dict[str, ndarray]) – A map from ONNX initializer name to desired scale factor for that initializer.
dq_only – Whether to only insert dq nodes.
quantized_weights (Dict[str, ndarray]) –
attributes (Dict[str, Any]) –
zero_points (Dict[str, ndarray] | None) –

insert_pre_quant_scale_nodes(graph, input_tensors, pre_quant_scale)

Insert new mul nodes into graph.

Parameters:

graph (Graph) – The graph to modify.
input_tensors (Dict[str, str]) – A dictionary of weight tensor names mapped to corresponding input tensor names
pre_quant_scale (Dict[str, ndarray]) – A map from ONNX input tensor name to corresponding pre-quant scale.

insert_qdq_nodes(graph, scales, weight_map)

Insert scales and QDQ nodes into graph.

Parameters:

graph (Graph) – The graph to modify.
scales (Dict[str, ndarray]) – A map from ONNX initializer name to desired scale factor for that initializer.
weight_map (Dict[str, Tensor]) – A map from ONNX initializer name to graphsurgeon tensor.

make_gs_awq_scale(name, scale)

Create a GraphSurgeon scale tensor from the given numpy array.

name is the desired _basename_ of the tensor.

Parameters:

name (str) –
scale (ndarray) –

Return type:

Constant

make_gs_dequantize_node(name, inputs, outputs, attributes=None)

Create a GraphSurgeon Dequantize node.

name is the desired _basename_ of the node.

Parameters:

name (str) –
inputs (Sequence[Tensor]) –
outputs (Sequence[Tensor]) –
attributes (Dict[str, Any]) –

Return type:

Node

make_gs_dequantize_output(name, shape, dtype)

Create a GraphSurgeon variable representing the output of a quantize node.

name is the desired _basename_ of the node.

Parameters:

name (str) –
shape (Sequence[int]) –
dtype (dtype) –

Return type:

Variable

make_gs_pre_quant_scale_node(name, inputs, outputs)

Create a GraphSurgeon Dequantize node.

name is the desired _basename_ of the node.

Parameters:

name (str) –
inputs (Sequence[Tensor]) –
outputs (Sequence[Tensor]) –

Return type:

Node

make_gs_pre_quant_scale_output(name, shape, dtype)

Create a GraphSurgeon variable representing the output of a quantize node.

name is the desired _basename_ of the node.

Parameters:

name (str) –
shape (Sequence[int]) –
dtype (dtype) –

Return type:

Variable

make_gs_quantize_node(name, inputs, outputs)

Create a GraphSurgeon Quantize node.

name is the desired _basename_ of the node.

Parameters:

name (str) –
inputs (Sequence[Tensor]) –
outputs (Sequence[Tensor]) –

Return type:

Node

make_gs_quantize_output(name, shape, dtype)

Create a GraphSurgeon variable representing the output of a quantize node.

name is the desired _basename_ of the node.

Parameters:

name (str) –
shape (Sequence[int]) –
dtype (<google.protobuf.internal.enum_type_wrapper.EnumTypeWrapper object at 0x7f553f7fadb0>) –

Return type:

Variable

make_gs_quantized_weight(name, wq, dtype)

Create a GraphSurgeon tensor from a quantized weight tensor.

name is the desired _basename_ of the tensor.

Parameters:

name (str) –
wq (ndarray) –

Return type:

Constant

make_gs_scale(name, scale)

Create a GraphSurgeon scale tensor from the given numpy array.

name is the desired _basename_ of the tensor.

Parameters:

name (str) –
scale (ndarray) –

Return type:

Constant

make_gs_zp(name, shape, dtype)

Create a GraphSurgeon zero-point tensor of all zeroes with the given shape.

name is the desired _basename_ of the tensor.

Parameters:

name (str) –
shape (Sequence[int]) –

Return type:

Constant

qdq_to_dq(onnx_model, verbose=False)

Convert FP32/FP16 weights of the given ONNX model to INT8/FP8 weights.

Q nodes will get removed from the weights and have only DQ nodes with those converted INT8/FP8 weights in the output model. Also dangling Q nodes get fused and update its consumer’s weight.

Parameters:

onnx_model (ModelProto) – ONNX model protobuf.
verbose (bool) –

Returns:

ONNX model protobuf with only DQ nodes for weights and QDQ nodes for activations.

Return type:

ModelProto

replace_scale_values(graph, act_scales_dict)

Replaces the scales values from calibration cache.

Parameters:

graph (GraphProto) –
act_scales_dict (Dict[str, float]) –

use_trt_qdq_ops(): Globally set node names to TRT custom names.