Core Transform APIs#

Common Transform Configuration#

Most transforms accept these common fields. Stage pages also show transform-specific configuration models when a transform extends this base configuration.

pydantic model tensorrt_llm._torch.auto_deploy.transform.interface.TransformConfig[source]

Bases: BaseModel

A simple configuration class that can be extended by a transform for configurability.

Show JSON schema

{
   "title": "TransformConfig",
   "description": "A simple configuration class that can be extended by a transform for configurability.",
   "type": "object",
   "properties": {
      "stage": {
         "$ref": "#/$defs/Stages",
         "description": "The stage of the transformation pipeline where this transform should run."
      },
      "run_per_gm": {
         "default": true,
         "description": "Whether to run the transform per graph (sub)module or on whole module.",
         "title": "Run Per Gm",
         "type": "boolean"
      },
      "enabled": {
         "default": true,
         "description": "Whether to enable this transform.",
         "title": "Enabled",
         "type": "boolean"
      },
      "skip_on_error": {
         "default": false,
         "description": "Whether to skip the transform if an error occurs.",
         "title": "Skip On Error",
         "type": "boolean"
      },
      "run_graph_cleanup": {
         "default": true,
         "description": "Whether to run graph cleanup/canonicalization after this transform.",
         "title": "Run Graph Cleanup",
         "type": "boolean"
      },
      "run_shape_prop": {
         "default": false,
         "description": "Whether to run shape propagation after this transform.",
         "title": "Run Shape Prop",
         "type": "boolean"
      },
      "requires_clean_graph": {
         "default": true,
         "description": "Whether this transform requires the graph to be clean before it is applied.",
         "title": "Requires Clean Graph",
         "type": "boolean"
      },
      "requires_shape_prop": {
         "default": false,
         "description": "Whether this transform requires shape propagation before it is applied.",
         "title": "Requires Shape Prop",
         "type": "boolean"
      },
      "debug_visualize_dir": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Debug visualization directory. None to disable visualization, or a path string to specify the output directory.",
         "title": "Debug Visualize Dir"
      },
      "expect_mem_change": {
         "default": false,
         "description": "Whether this transform is expected to cause changes in CUDA memory stats.",
         "title": "Expect Mem Change",
         "type": "boolean"
      }
   },
   "$defs": {
      "Stages": {
         "description": "Enumerated (ordered!) stages of the transformation pipeline.\n\nThis is used to classify and pre-order transforms.",
         "enum": [
            "factory",
            "export",
            "post_export",
            "pattern_matcher",
            "sharding",
            "weight_load",
            "post_load_fusion",
            "cache_init",
            "visualize",
            "compile"
         ],
         "title": "Stages",
         "type": "string"
      }
   },
   "additionalProperties": true,
   "required": [
      "stage"
   ]
}

Config:

extra: str = allow

Fields:

debug_visualize_dir (str | None)
enabled (bool)
expect_mem_change (bool)
requires_clean_graph (bool)
requires_shape_prop (bool)
run_graph_cleanup (bool)
run_per_gm (bool)
run_shape_prop (bool)
skip_on_error (bool)
stage (tensorrt_llm._torch.auto_deploy.transform.interface.Stages)

field debug_visualize_dir: str | None = None: Debug visualization directory. None to disable visualization, or a path string to specify the output directory.

field enabled: bool = True: Whether to enable this transform.

field expect_mem_change: bool = False: Whether this transform is expected to cause changes in CUDA memory stats.

field requires_clean_graph: bool = True: Whether this transform requires the graph to be clean before it is applied.

field requires_shape_prop: bool = False: Whether this transform requires shape propagation before it is applied.

field run_graph_cleanup: bool = True: Whether to run graph cleanup/canonicalization after this transform.

field run_per_gm: bool = True: Whether to run the transform per graph (sub)module or on whole module.

field run_shape_prop: bool = False: Whether to run shape propagation after this transform.

field skip_on_error: bool = False: Whether to skip the transform if an error occurs.

field stage: Stages [Required]: The stage of the transformation pipeline where this transform should run.

Transform Interface#

The interface for all transforms.

This module defines the base classes and interfaces for all transforms.

class tensorrt_llm._torch.auto_deploy.transform.interface.MemStats( tot: float, free: float, resv: float, alloc: float, frag: float, )[source]#

Bases: object

Memory statistics snapshot for tracking CUDA memory usage.

tot: float#

free: float#

resv: float#

alloc: float#

frag: float#

diff( other: MemStats, ) → MemStats[source]#: Calculate the difference (self - other).

to_dict() → Dict[str, float][source]#: Convert to dictionary for serialization.

classmethod from_dict( d: Dict[str, float], ) → MemStats[source]#: Create from dictionary.

exception tensorrt_llm._torch.auto_deploy.transform.interface.TransformError[source]#

Bases: Exception

An exception raised when a transform fails.

class tensorrt_llm._torch.auto_deploy.transform.interface.Stages( value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None, )[source]#

Bases: Enum

Enumerated (ordered!) stages of the transformation pipeline.

This is used to classify and pre-order transforms.

FACTORY = 'factory'#

EXPORT = 'export'#

POST_EXPORT = 'post_export'#

PATTERN_MATCHER = 'pattern_matcher'#

SHARDING = 'sharding'#

WEIGHT_LOAD = 'weight_load'#

POST_LOAD_FUSION = 'post_load_fusion'#

CACHE_INIT = 'cache_init'#

VISUALIZE = 'visualize'#

COMPILE = 'compile'#

pydantic model tensorrt_llm._torch.auto_deploy.transform.interface.SharedConfig[source]#

Bases: BaseModel

Global config shared between multiple transforms in the inference optimizer.

Show JSON schema

{
   "title": "SharedConfig",
   "description": "Global config shared between multiple transforms in the inference optimizer.",
   "type": "object",
   "properties": {
      "local_rank": {
         "default": 0,
         "title": "Local Rank",
         "type": "integer"
      },
      "world_size": {
         "default": 1,
         "title": "World Size",
         "type": "integer"
      },
      "dist_config": {
         "anyOf": [
            {
               "$ref": "#/$defs/DistConfig"
            },
            {
               "type": "null"
            }
         ],
         "default": null
      }
   },
   "$defs": {
      "DistConfig": {
         "additionalProperties": true,
         "description": "Distributed parallelism configuration for AutoDeploy.",
         "properties": {
            "world_size": {
               "default": 1,
               "minimum": 1,
               "title": "World Size",
               "type": "integer"
            },
            "rank": {
               "default": 0,
               "minimum": 0,
               "title": "Rank",
               "type": "integer"
            },
            "tp_size": {
               "default": 1,
               "minimum": 1,
               "title": "Tp Size",
               "type": "integer"
            },
            "pp_size": {
               "default": 1,
               "minimum": 1,
               "title": "Pp Size",
               "type": "integer"
            },
            "moe_tp_size": {
               "default": 1,
               "minimum": 1,
               "title": "Moe Tp Size",
               "type": "integer"
            },
            "moe_ep_size": {
               "default": 1,
               "minimum": 1,
               "title": "Moe Ep Size",
               "type": "integer"
            },
            "moe_cluster_size": {
               "default": 1,
               "minimum": 1,
               "title": "Moe Cluster Size",
               "type": "integer"
            },
            "enable_attention_dp": {
               "default": false,
               "title": "Enable Attention Dp",
               "type": "boolean"
            },
            "allreduce_strategy": {
               "default": "NCCL",
               "title": "Allreduce Strategy",
               "type": "string"
            }
         },
         "title": "DistConfig",
         "type": "object"
      }
   },
   "additionalProperties": true
}

Config:

extra: str = allow
arbitrary_types_allowed: bool = True

Fields:

dist_config (tensorrt_llm._torch.auto_deploy.utils.dist_config.DistConfig | None)
local_rank (int)
world_size (int)

field dist_config: DistConfig | None = None#

field local_rank: int = 0#

field world_size: int = 1#

pydantic model tensorrt_llm._torch.auto_deploy.transform.interface.TransformConfig[source]#

Bases: BaseModel

A simple configuration class that can be extended by a transform for configurability.

Show JSON schema

{
   "title": "TransformConfig",
   "description": "A simple configuration class that can be extended by a transform for configurability.",
   "type": "object",
   "properties": {
      "stage": {
         "$ref": "#/$defs/Stages",
         "description": "The stage of the transformation pipeline where this transform should run."
      },
      "run_per_gm": {
         "default": true,
         "description": "Whether to run the transform per graph (sub)module or on whole module.",
         "title": "Run Per Gm",
         "type": "boolean"
      },
      "enabled": {
         "default": true,
         "description": "Whether to enable this transform.",
         "title": "Enabled",
         "type": "boolean"
      },
      "skip_on_error": {
         "default": false,
         "description": "Whether to skip the transform if an error occurs.",
         "title": "Skip On Error",
         "type": "boolean"
      },
      "run_graph_cleanup": {
         "default": true,
         "description": "Whether to run graph cleanup/canonicalization after this transform.",
         "title": "Run Graph Cleanup",
         "type": "boolean"
      },
      "run_shape_prop": {
         "default": false,
         "description": "Whether to run shape propagation after this transform.",
         "title": "Run Shape Prop",
         "type": "boolean"
      },
      "requires_clean_graph": {
         "default": true,
         "description": "Whether this transform requires the graph to be clean before it is applied.",
         "title": "Requires Clean Graph",
         "type": "boolean"
      },
      "requires_shape_prop": {
         "default": false,
         "description": "Whether this transform requires shape propagation before it is applied.",
         "title": "Requires Shape Prop",
         "type": "boolean"
      },
      "debug_visualize_dir": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Debug visualization directory. None to disable visualization, or a path string to specify the output directory.",
         "title": "Debug Visualize Dir"
      },
      "expect_mem_change": {
         "default": false,
         "description": "Whether this transform is expected to cause changes in CUDA memory stats.",
         "title": "Expect Mem Change",
         "type": "boolean"
      }
   },
   "$defs": {
      "Stages": {
         "description": "Enumerated (ordered!) stages of the transformation pipeline.\n\nThis is used to classify and pre-order transforms.",
         "enum": [
            "factory",
            "export",
            "post_export",
            "pattern_matcher",
            "sharding",
            "weight_load",
            "post_load_fusion",
            "cache_init",
            "visualize",
            "compile"
         ],
         "title": "Stages",
         "type": "string"
      }
   },
   "additionalProperties": true,
   "required": [
      "stage"
   ]
}

Config:

extra: str = allow

Fields:

debug_visualize_dir (str | None)
enabled (bool)
expect_mem_change (bool)
requires_clean_graph (bool)
requires_shape_prop (bool)
run_graph_cleanup (bool)
run_per_gm (bool)
run_shape_prop (bool)
skip_on_error (bool)
stage (tensorrt_llm._torch.auto_deploy.transform.interface.Stages)

field debug_visualize_dir: str | None = None#: Debug visualization directory. None to disable visualization, or a path string to specify the output directory.

field enabled: bool = True#: Whether to enable this transform.

field expect_mem_change: bool = False#: Whether this transform is expected to cause changes in CUDA memory stats.

field requires_clean_graph: bool = True#: Whether this transform requires the graph to be clean before it is applied.

field requires_shape_prop: bool = False#: Whether this transform requires shape propagation before it is applied.

field run_graph_cleanup: bool = True#: Whether to run graph cleanup/canonicalization after this transform.

field run_per_gm: bool = True#: Whether to run the transform per graph (sub)module or on whole module.

field run_shape_prop: bool = False#: Whether to run shape propagation after this transform.

field skip_on_error: bool = False#: Whether to skip the transform if an error occurs.

field stage: Stages [Required]#: The stage of the transformation pipeline where this transform should run.

pydantic model tensorrt_llm._torch.auto_deploy.transform.interface.TransformInfo[source]#

Bases: BaseModel

Information about the result of a transform.

Show JSON schema

{
   "title": "TransformInfo",
   "description": "Information about the result of a transform.",
   "type": "object",
   "properties": {
      "skipped": {
         "default": true,
         "description": "Whether the transform was skipped.",
         "title": "Skipped",
         "type": "boolean"
      },
      "num_matches": {
         "default": 0,
         "description": "Number of matches found.",
         "title": "Num Matches",
         "type": "integer"
      },
      "is_clean": {
         "default": false,
         "description": "Whether the graph is clean after the transform. This can be set by the transform to indicate that the transform does not change the graph and it preserves the is_clean flag of the last transform.",
         "title": "Is Clean",
         "type": "boolean"
      },
      "has_valid_shapes": {
         "default": false,
         "description": "Whether meta tensor shapes are valid after the transform. This can be set by the transform to indicate that the transform does not affect the shapes in the meta information of the graph. In other words, the transform does not change the shapes of the tensors in the graph and it preserves the has_valid_shapes flag of the last transform.",
         "title": "Has Valid Shapes",
         "type": "boolean"
      }
   }
}

Config:

frozen: bool = True

Fields:

has_valid_shapes (bool)
is_clean (bool)
num_matches (int)
skipped (bool)

field has_valid_shapes: bool = False#: Whether meta tensor shapes are valid after the transform. This can be set by the transform to indicate that the transform does not affect the shapes in the meta information of the graph. In other words, the transform does not change the shapes of the tensors in the graph and it preserves the has_valid_shapes flag of the last transform.

field is_clean: bool = False#: Whether the graph is clean after the transform. This can be set by the transform to indicate that the transform does not change the graph and it preserves the is_clean flag of the last transform.

field num_matches: int = 0#: Number of matches found.

field skipped: bool = True#: Whether the transform was skipped.

classmethod from_last_info( info: TransformInfo, ) → TransformInfo[source]#: Create a new TransformInfo from the last transform info.

tensorrt_llm._torch.auto_deploy.transform.interface.with_transform_logging(call_fn: Callable) → Callable[source]#

Decorator to prepend transform-specific prefix to all ad_logger logs during __call__.

Temporarily patches ad_logger.log so that any logs emitted within the call automatically include the [stage=…, transform=…] prefix that _log_info would otherwise add manually. The original logger behavior is restored after the call, even if an exception occurs.

class tensorrt_llm._torch.auto_deploy.transform.interface.BaseTransform( config: TransformConfig, )[source]#

Bases: ABC

A base class for all transforms.

classmethod get_transform_key() → str[source]#

Get the short name of the transform.

This is used to identify the transform in the transformation pipeline.

classmethod get_config_class() → Type[TransformConfig][source]#

Get the configuration class for the transform.

This is used to validate the configuration of the transform.

config: TransformConfig#

final classmethod from_kwargs(

**kwargs,

) → BaseTransform[source]#

Create a transform from kwargs.

Parameters:: **kwargs – The configuration for the transform.
Returns:: The transform instance.

class tensorrt_llm._torch.auto_deploy.transform.interface.TransformRegistry[source]#

Bases: object

A registry for all transforms.

classmethod register( name: str, ) → Callable[[Type[BaseTransform]], Type[BaseTransform]][source]#

classmethod get( name: str, ) → Type[BaseTransform][source]#: Get the transform class by name.

classmethod get_config_class( name: str, ) → Type[TransformConfig][source]#: Get the configuration class for a transform by name.

classmethod has(name: str) → bool[source]#: Check if a transform is registered.

Optimizer#

High-level entrypoint to transform a model into an efficient inference model.

class tensorrt_llm._torch.auto_deploy.transform.optimizer.InferenceOptimizer( factory: ModelFactory, config: Mapping[str, TransformConfig | Dict[str, Any]], dist_config: DistConfig | None = None, )[source]#: Bases: object

Graph Module Visualizer#

PyTorch GraphModule Visualization Tool

This module provides functionality to convert PyTorch GraphModule to Graphviz diagrams. Supports different node styles and detailed graph annotations.

Key Features: - Convert FX GraphModule to Graphviz diagrams - Display tensor shape information on edges - Adjust edge width based on tensor element count - Intelligent port assignment for multi-input/output handling - Color coding based on tensor identity

Usage Example:

import torch import torch.fx as fx from graph_module_visualizer import to_dot

# Trace model model = YourModel() traced = fx.symbolic_trace(model)

# Generate visualization dot = to_dot(traced, format=”svg”, include_shapes=True)

Requirements: pip install graphviz

tensorrt_llm._torch.auto_deploy.transform.graph_module_visualizer.to_dot( graph_module: GraphModule, name: str, save_path: str, format: str = 'svg', include_shapes: bool = True, ) → Digraph | None[source]#

Convert PyTorch GraphModule to Graphviz diagram

Parameters:

graph_module – GraphModule to visualize
name – Name of the diagram
save_path – Save path, if None uses name
format – Output format (‘png’, ‘pdf’, ‘svg’, ‘dot’, etc.)
include_shapes – Whether to include tensor shape information

Returns:

graphviz.Digraph object

tensorrt_llm._torch.auto_deploy.transform.graph_module_visualizer.analyze_graph_structure( graph_module: GraphModule, ) → Dict[str, Any][source]#

Analyze structural statistics of GraphModule

Parameters:: graph_module – GraphModule to analyze
Returns:: Dictionary containing structural statistics