partitioning

Utilities related to partitioning the ONNX model to place QDQ nodes.

Functions

`find_fusible_partitions`	Traverses the graph and collects all cask/kgen fusible partitions.
`find_hardcoded_patterns`	Finds some non-quantizable pre-defined patterns!.
`find_layer_norm_partitions`	Finds the layer norm patterns in the graph.
`find_mha_partitions`	Finds the MHA core (QK_AV) patterns in the graph that should not be quantized.
`find_non_quantizable_partitions_from_patterns`	Finds fusible partition from fixed patterns.
`find_quantizable_nodes`	Return the graph ops which are quantizable but not partitioned yet.
`get_skipped_output_layers`	Returns the name of the non-quantizable output layers.

find_fusible_partitions(graph, partitioned_nodes, non_residual_inputs)

Traverses the graph and collects all cask/kgen fusible partitions.

Parameters:

graph (Graph) – Onnx model graph.
partitioned_nodes (set[str]) – Set of already partitioned nodes.
non_residual_inputs (dict[str, str]) – Non-residual input map.

Returns:

List of partitions that are fusible by CASK with Conv/MatMul backbone. List of KGEN partitions with pointwise ops only.

Return type:

tuple[list[list[Node]], list[list[Node]]]

find_hardcoded_patterns(graph)

Finds some non-quantizable pre-defined patterns!.

Note. matching this tail pattern causes MTL_v1 -5.5% [“ReduceSum”, “Add”, “Div”, “Mul”, “ReduceSum”, “Sub”, “Pow”, “Mul”, “ReduceSum”, “Sqrt”]

Parameters:: graph (Graph)
Return type:: list[list[Node]]

find_layer_norm_partitions(graph)

Finds the layer norm patterns in the graph.

Parameters:: graph (Graph)
Return type:: list[list[Node]]

find_mha_partitions(graph)

Finds the MHA core (QK_AV) patterns in the graph that should not be quantized.

A common MHA implementation looks like this: t -> MatMul -> (optional) Pointwise ops (such as Add, Mul, Sub) -> Softmax -> MatMul -> output Patterns that do not look like that should not be quantized (at least for now).

Parameters:: graph (Graph)
Return type:: list[list[Node]]

find_non_quantizable_partitions_from_patterns(graph)

Finds fusible partition from fixed patterns.

Certain fused kernel counterpart is often a subgraph of native ops in onnx. Those patterns are identified here and quantized to match compiler expectation.

Parameters:: graph (Graph)
Return type:: list[list[str]]

find_quantizable_nodes(graph, nodes_to_quantize, partitioned_nodes, quantizable_op_types)

Return the graph ops which are quantizable but not partitioned yet.

Parameters:

graph (Graph)
nodes_to_quantize (list[Node])
partitioned_nodes (set[str])
quantizable_op_types (list[str])

Return type:

list[Node]

get_skipped_output_layers(graph, partially_quantizable_nodes)

Returns the name of the non-quantizable output layers.

Parameters:

graph (Graph)
partially_quantizable_nodes (list[Node])

Return type:

list[str]