partitioning

Utilities related to partitioning the ONNX model to place QDQ nodes.

Functions

find_fusible_partitions

Traverses the graph and collects all cask/kgen fusible partitions.

find_hardcoded_patterns

Finds some non-quantizable pre-defined patterns!.

find_layer_norm_partitions

Finds the layer norm patterns in the graph.

find_mha_partitions

Finds the MHA patterns in the graph that should not be quantized.

find_non_quantizable_partitions_from_patterns

Finds fusible partition from fixed patterns.

find_quantizable_nodes

Return the graph ops which are quantizable but not partitioned yet.

get_skiped_output_layers

Returns the name of the non-quantizable output layers.

find_fusible_partitions(graph, partitioned_nodes, non_residual_inputs)

Traverses the graph and collects all cask/kgen fusible partitions.

Parameters:
  • graph (Graph) – Onnx model graph.

  • partitioned_nodes (Set[str]) – Set of already partitioned nodes.

  • non_residual_inputs (Dict[str, str]) – Non-residual input map.

Returns:

List of partitions that are fusible by CASK with Conv/MatMul backbone. List of KGEN partitions with pointwise ops only.

Return type:

Tuple[List[List[Node]], List[List[Node]]]

find_hardcoded_patterns(graph)

Finds some non-quantizable pre-defined patterns!.

Note. matching this tail pattern causes MTL_v1 -5.5% [“ReduceSum”, “Add”, “Div”, “Mul”, “ReduceSum”, “Sub”, “Pow”, “Mul”, “ReduceSum”, “Sqrt”]

Parameters:

graph (Graph) –

Return type:

List[List[Node]]

find_layer_norm_partitions(graph)

Finds the layer norm patterns in the graph.

Parameters:

graph (Graph) –

Return type:

List[List[Node]]

find_mha_partitions(graph)

Finds the MHA patterns in the graph that should not be quantized.

A common MHA implementation looks like this: t -> MatMul -> (optional) Pointwise ops (such as Add, Mul, Sub) -> Softmax -> MatMul -> output Patterns that do not look like that should not be quantized (at least for now).

Parameters:

graph (Graph) –

Return type:

List[List[Node]]

find_non_quantizable_partitions_from_patterns(graph)

Finds fusible partition from fixed patterns.

Certain fused kernel counterpart is often a subgraph of native ops in onnx. Those patterns are identified here and quantized to match compiler expectation.

Parameters:

graph (Graph) –

Return type:

List[List[str]]

find_quantizable_nodes(graph, nodes_to_quantize, partitioned_nodes, quantizable_op_types)

Return the graph ops which are quantizable but not partitioned yet.

Parameters:
  • graph (Graph) –

  • nodes_to_quantize (List[Node]) –

  • partitioned_nodes (Set[str]) –

  • quantizable_op_types (List[str]) –

Return type:

List[Node]

get_skiped_output_layers(graph, paritially_quantizable_nodes)

Returns the name of the non-quantizable output layers.

Parameters:
  • graph (Graph) –

  • paritially_quantizable_nodes (List[Node]) –

Return type:

List[str]