partitioning
Utilities related to partitioning the ONNX model to place QDQ nodes.
Functions
Traverses the graph and collects all cask/kgen fusible partitions. |
|
Finds some non-quantizable pre-defined patterns!. |
|
Finds the layer norm patterns in the graph. |
|
Finds the MHA patterns in the graph that should not be quantized. |
|
Finds fusible partition from fixed patterns. |
|
Return the graph ops which are quantizable but not partitioned yet. |
|
Returns the name of the non-quantizable output layers. |
- find_fusible_partitions(graph, partitioned_nodes, non_residual_inputs)
Traverses the graph and collects all cask/kgen fusible partitions.
- Parameters:
graph (Graph) – Onnx model graph.
partitioned_nodes (Set[str]) – Set of already partitioned nodes.
non_residual_inputs (Dict[str, str]) – Non-residual input map.
- Returns:
List of partitions that are fusible by CASK with Conv/MatMul backbone. List of KGEN partitions with pointwise ops only.
- Return type:
Tuple[List[List[Node]], List[List[Node]]]
- find_hardcoded_patterns(graph)
Finds some non-quantizable pre-defined patterns!.
Note. matching this tail pattern causes MTL_v1 -5.5% [“ReduceSum”, “Add”, “Div”, “Mul”, “ReduceSum”, “Sub”, “Pow”, “Mul”, “ReduceSum”, “Sqrt”]
- Parameters:
graph (Graph) –
- Return type:
List[List[Node]]
- find_layer_norm_partitions(graph)
Finds the layer norm patterns in the graph.
- Parameters:
graph (Graph) –
- Return type:
List[List[Node]]
- find_mha_partitions(graph)
Finds the MHA patterns in the graph that should not be quantized.
A common MHA implementation looks like this: t -> MatMul -> (optional) Pointwise ops (such as Add, Mul, Sub) -> Softmax -> MatMul -> output Patterns that do not look like that should not be quantized (at least for now).
- Parameters:
graph (Graph) –
- Return type:
List[List[Node]]
- find_non_quantizable_partitions_from_patterns(graph)
Finds fusible partition from fixed patterns.
Certain fused kernel counterpart is often a subgraph of native ops in onnx. Those patterns are identified here and quantized to match compiler expectation.
- Parameters:
graph (Graph) –
- Return type:
List[List[str]]
- find_quantizable_nodes(graph, nodes_to_quantize, partitioned_nodes, quantizable_op_types)
Return the graph ops which are quantizable but not partitioned yet.
- Parameters:
graph (Graph) –
nodes_to_quantize (List[Node]) –
partitioned_nodes (Set[str]) –
quantizable_op_types (List[str]) –
- Return type:
List[Node]
- get_skiped_output_layers(graph, paritially_quantizable_nodes)
Returns the name of the non-quantizable output layers.
- Parameters:
graph (Graph) –
paritially_quantizable_nodes (List[Node]) –
- Return type:
List[str]