qdq_utils
Various utils to support inserting Q/DQ nodes.
Functions
Insert new initializers and DQ nodes into graph. |
|
Insert new mul nodes into graph. |
|
Insert scales and QDQ nodes into graph. |
|
Create a GraphSurgeon scale tensor from the given numpy array. |
|
Create a GraphSurgeon Dequantize node. |
|
Create a GraphSurgeon variable representing the output of a quantize node. |
|
Create a GraphSurgeon Dequantize node. |
|
Create a GraphSurgeon variable representing the output of a quantize node. |
|
Create a GraphSurgeon Quantize node. |
|
Create a GraphSurgeon variable representing the output of a quantize node. |
|
Create a GraphSurgeon tensor from a quantized weight tensor. |
|
Create a GraphSurgeon scale tensor from the given numpy array. |
|
Create a GraphSurgeon zero-point tensor of all zeroes with the given shape. |
|
Convert FP32/FP16 weights of the given ONNX model to INT8/FP8 weights. |
|
Replaces the scales values from calibration cache. |
|
Globally set node names to TRT custom names. |
- insert_dq_nodes(graph, scales, quantized_weights, attributes=None, zero_points=None)
Insert new initializers and DQ nodes into graph.
- Parameters:
graph (Graph) – The graph to modify.
weights – A map from ONNX initializer name to tensor.
scales (Dict[str, ndarray]) – A map from ONNX initializer name to desired scale factor for that initializer.
dq_only – Whether to only insert dq nodes.
quantized_weights (Dict[str, ndarray]) –
attributes (Dict[str, Any]) –
zero_points (Dict[str, ndarray] | None) –
- insert_pre_quant_scale_nodes(graph, input_tensors, pre_quant_scale)
Insert new mul nodes into graph.
- Parameters:
graph (Graph) – The graph to modify.
input_tensors (Dict[str, str]) – A dictionary of weight tensor names mapped to corresponding input tensor names
pre_quant_scale (Dict[str, ndarray]) – A map from ONNX input tensor name to corresponding pre-quant scale.
- insert_qdq_nodes(graph, scales, weight_map)
Insert scales and QDQ nodes into graph.
- Parameters:
graph (Graph) – The graph to modify.
scales (Dict[str, ndarray]) – A map from ONNX initializer name to desired scale factor for that initializer.
weight_map (Dict[str, Tensor]) – A map from ONNX initializer name to graphsurgeon tensor.
- make_gs_awq_scale(name, scale)
Create a GraphSurgeon scale tensor from the given numpy array.
name is the desired _basename_ of the tensor.
- Parameters:
name (str) –
scale (ndarray) –
- Return type:
Constant
- make_gs_dequantize_node(name, inputs, outputs, attributes=None)
Create a GraphSurgeon Dequantize node.
name is the desired _basename_ of the node.
- Parameters:
name (str) –
inputs (Sequence[Tensor]) –
outputs (Sequence[Tensor]) –
attributes (Dict[str, Any]) –
- Return type:
Node
- make_gs_dequantize_output(name, shape, dtype)
Create a GraphSurgeon variable representing the output of a quantize node.
name is the desired _basename_ of the node.
- Parameters:
name (str) –
shape (Sequence[int]) –
dtype (dtype) –
- Return type:
Variable
- make_gs_pre_quant_scale_node(name, inputs, outputs)
Create a GraphSurgeon Dequantize node.
name is the desired _basename_ of the node.
- Parameters:
name (str) –
inputs (Sequence[Tensor]) –
outputs (Sequence[Tensor]) –
- Return type:
Node
- make_gs_pre_quant_scale_output(name, shape, dtype)
Create a GraphSurgeon variable representing the output of a quantize node.
name is the desired _basename_ of the node.
- Parameters:
name (str) –
shape (Sequence[int]) –
dtype (dtype) –
- Return type:
Variable
- make_gs_quantize_node(name, inputs, outputs)
Create a GraphSurgeon Quantize node.
name is the desired _basename_ of the node.
- Parameters:
name (str) –
inputs (Sequence[Tensor]) –
outputs (Sequence[Tensor]) –
- Return type:
Node
- make_gs_quantize_output(name, shape, dtype)
Create a GraphSurgeon variable representing the output of a quantize node.
name is the desired _basename_ of the node.
- Parameters:
name (str) –
shape (Sequence[int]) –
dtype (<google.protobuf.internal.enum_type_wrapper.EnumTypeWrapper object at 0x7f4d2fb110d0>) –
- Return type:
Variable
- make_gs_quantized_weight(name, wq, dtype)
Create a GraphSurgeon tensor from a quantized weight tensor.
name is the desired _basename_ of the tensor.
- Parameters:
name (str) –
wq (ndarray) –
- Return type:
Constant
- make_gs_scale(name, scale)
Create a GraphSurgeon scale tensor from the given numpy array.
name is the desired _basename_ of the tensor.
- Parameters:
name (str) –
scale (ndarray) –
- Return type:
Constant
- make_gs_zp(name, shape, dtype)
Create a GraphSurgeon zero-point tensor of all zeroes with the given shape.
name is the desired _basename_ of the tensor.
- Parameters:
name (str) –
shape (Sequence[int]) –
- Return type:
Constant
- qdq_to_dq(onnx_model, verbose=False)
Convert FP32/FP16 weights of the given ONNX model to INT8/FP8 weights.
Q nodes will get removed from the weights and have only DQ nodes with those converted INT8/FP8 weights in the output model. Also dangling Q nodes get fused and update its consumer’s weight.
- Parameters:
onnx_model (ModelProto) – ONNX model protobuf.
verbose (bool) –
- Returns:
ONNX model protobuf with only DQ nodes for weights and QDQ nodes for activations.
- Return type:
ModelProto
- replace_scale_values(graph, act_scales_dict)
Replaces the scales values from calibration cache.
- Parameters:
graph (GraphProto) –
act_scales_dict (Dict[str, float]) –
- use_trt_qdq_ops()
Globally set node names to TRT custom names.